Running LLMs Decentralized: Browser Inference and Internet Computer Hosting
Build privacy-first AI applications with client-side LLMs using Transformers.js, Ollama.js for local models, and decentralized hosting on the Internet Computer

Modern web applications can now leverage powerful AI capabilities directly in the browser or through decentralized hosting. This guide explores three approaches: Transformers.js for pure client-side inference, Ollama.js for local model connections, and Internet Computer canisters for decentralized AI hosting.
New to Ollama? Start with Running LLMs Locally with Ollama - Privacy-First AI Development to learn the CLI basics, model management, and local development workflows before diving into web integration.
Why Run LLMs in the Browser?
Benefits
- Privacy: User data never leaves their device
- Zero Backend Costs: No API fees or server infrastructure
- Offline Capability: Works without internet connection
- Low Latency: No network round-trips
- Scalability: Computation distributed across users
Use Cases
- Private chat interfaces
- Content generation tools
- Code assistants
- Text analysis and summarization
- Image captioning and understanding
- Translation services
- Sentiment analysis
Approach 1: Ollama.js (Local Model Access)
Ollama.js lets your web app connect to a locally running Ollama server, combining browser convenience with powerful local models.
Installation
npm install ollama
Basic Setup
import { Ollama } from 'ollama'
// Connect to local Ollama server
const ollama = new Ollama({ host: 'http://localhost:11434' })
// Generate text
async function generate(prompt) {
const response = await ollama.generate({
model: 'llama3.2',
prompt: prompt,
})
return response.response
}
// Usage
const result = await generate('Explain async/await in JavaScript')
console.log(result)
Streaming Responses
For real-time output:
async function streamGenerate(prompt, onChunk) {
const response = await ollama.generate({
model: 'llama3.2',
prompt: prompt,
stream: true,
})
for await (const part of response) {
onChunk(part.response)
}
}
// Usage in React
function ChatComponent() {
const [output, setOutput] = useState('')
const handleSubmit = async (prompt) => {
setOutput('')
await streamGenerate(prompt, (chunk) => {
setOutput(prev => prev + chunk)
})
}
return (
<div>
<div>{output}</div>
<button onClick={() => handleSubmit('Write a poem')}>
Generate
</button>
</div>
)
}
Chat Interface
Build conversational interfaces:
async function chat(messages) {
const response = await ollama.chat({
model: 'llama3.2',
messages: messages,
})
return response.message
}
// Usage
const conversation = [
{ role: 'user', content: 'What is recursion?' },
{ role: 'assistant', content: 'Recursion is when a function calls itself...' },
{ role: 'user', content: 'Give me an example in Python' }
]
const reply = await chat(conversation)
console.log(reply.content)
Vision Models
Process images with multi-modal models:
async function analyzeImage(imagePath, prompt) {
const response = await ollama.generate({
model: 'llava',
prompt: prompt,
images: [imagePath]
})
return response.response
}
// With base64 encoded images
async function analyzeBase64Image(base64Image, prompt) {
const response = await ollama.generate({
model: 'llava',
prompt: prompt,
images: [base64Image]
})
return response.response
}
// Usage in web app
const fileInput = document.getElementById('imageInput')
fileInput.addEventListener('change', async (e) => {
const file = e.target.files[0]
const reader = new FileReader()
reader.onload = async (event) => {
const base64 = event.target.result.split(',')[1]
const description = await analyzeBase64Image(
base64,
'Describe this image in detail'
)
console.log(description)
}
reader.readAsDataURL(file)
})
React Example - Complete Chat App
import { useState } from 'react'
import { Ollama } from 'ollama'
function ChatApp() {
const [messages, setMessages] = useState([])
const [input, setInput] = useState('')
const [loading, setLoading] = useState(false)
const ollama = new Ollama({ host: 'http://localhost:11434' })
const sendMessage = async () => {
if (!input.trim()) return
const userMessage = { role: 'user', content: input }
setMessages(prev => [...prev, userMessage])
setInput('')
setLoading(true)
try {
const response = await ollama.chat({
model: 'llama3.2',
messages: [...messages, userMessage],
stream: true,
})
let assistantMessage = { role: 'assistant', content: '' }
setMessages(prev => [...prev, assistantMessage])
for await (const part of response) {
assistantMessage.content += part.message.content
setMessages(prev => {
const updated = [...prev]
updated[updated.length - 1] = { ...assistantMessage }
return updated
})
}
} catch (error) {
console.error('Error:', error)
} finally {
setLoading(false)
}
}
return (
<div className="chat-container">
<div className="messages">
{messages.map((msg, idx) => (
<div key={idx} className={`message ${msg.role}`}>
<strong>{msg.role}:</strong> {msg.content}
</div>
))}
</div>
<div className="input-area">
<input
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
placeholder="Type a message..."
disabled={loading}
/>
<button onClick={sendMessage} disabled={loading}>
{loading ? 'Sending...' : 'Send'}
</button>
</div>
</div>
)
}
export default ChatApp
CORS Configuration
Enable CORS for browser access:
# Set Ollama origins environment variable
export OLLAMA_ORIGINS="http://localhost:3000,http://localhost:5173"
# Then start Ollama
ollama serve
Or permanently in config:
# ~/.ollama/config.json
{
"origins": ["http://localhost:3000", "http://localhost:5173"]
}
Approach 2: Transformers.js (In-Browser Inference)
Transformers.js runs models entirely in the browser using WebAssembly and WebGPU, with no backend required.
Installation
npm install @xenova/transformers
Basic Text Generation
import { pipeline } from '@xenova/transformers'
// Create a text generation pipeline
const generator = await pipeline('text-generation', 'Xenova/gpt2')
// Generate text
const output = await generator('Once upon a time', {
max_new_tokens: 50,
temperature: 0.7,
})
console.log(output[0].generated_text)
Sentiment Analysis
import { pipeline } from '@xenova/transformers'
// Create sentiment analyzer
const classifier = await pipeline(
'sentiment-analysis',
'Xenova/distilbert-base-uncased-finetuned-sst-2-english'
)
// Analyze text
const result = await classifier('I love this product!')
console.log(result)
// [{ label: 'POSITIVE', score: 0.9998 }]
Text Summarization
import { pipeline } from '@xenova/transformers'
const summarizer = await pipeline('summarization', 'Xenova/distilbart-cnn-6-6')
const text = `
The Transformer architecture was introduced in the paper "Attention is All You Need"
in 2017. It revolutionized natural language processing by replacing recurrent neural
networks with self-attention mechanisms, enabling parallel processing of sequences.
`
const summary = await summarizer(text, {
max_length: 50,
min_length: 10,
})
console.log(summary[0].summary_text)
Translation
import { pipeline } from '@xenova/transformers'
// English to German
const translator = await pipeline(
'translation',
'Xenova/nllb-200-distilled-600M'
)
const result = await translator('Hello, how are you?', {
src_lang: 'eng_Latn',
tgt_lang: 'deu_Latn',
})
console.log(result[0].translation_text)
// "Hallo, wie geht es dir?"
Question Answering
import { pipeline } from '@xenova/transformers'
const qa = await pipeline('question-answering', 'Xenova/distilbert-base-cased-distilled-squad')
const context = `
The Eiffel Tower is located in Paris, France. It was completed in 1889
and stands 330 meters tall. It was designed by Gustave Eiffel.
`
const answer = await qa({
question: 'How tall is the Eiffel Tower?',
context: context,
})
console.log(answer.answer) // "330 meters"
Image Classification
import { pipeline } from '@xenova/transformers'
// Load image classifier
const classifier = await pipeline(
'image-classification',
'Xenova/vit-base-patch16-224'
)
// From URL
const result = await classifier('https://example.com/cat.jpg')
console.log(result)
// [{ label: 'tabby cat', score: 0.95 }, ...]
// From File Input
const fileInput = document.getElementById('imageInput')
fileInput.addEventListener('change', async (e) => {
const file = e.target.files[0]
const result = await classifier(file)
console.log(result)
})
Feature Extraction / Embeddings
import { pipeline } from '@xenova/transformers'
// Create embeddings for semantic search
const extractor = await pipeline(
'feature-extraction',
'Xenova/all-MiniLM-L6-v2'
)
const embeddings = await extractor('This is a sample sentence', {
pooling: 'mean',
normalize: true,
})
console.log(embeddings.data) // Float32Array of embeddings
React Example - Text Summarizer
import { useState, useEffect } from 'react'
import { pipeline } from '@xenova/transformers'
function TextSummarizer() {
const [summarizer, setSummarizer] = useState(null)
const [input, setInput] = useState('')
const [summary, setSummary] = useState('')
const [loading, setLoading] = useState(false)
const [modelLoading, setModelLoading] = useState(true)
useEffect(() => {
async function loadModel() {
const model = await pipeline('summarization', 'Xenova/distilbart-cnn-6-6')
setSummarizer(model)
setModelLoading(false)
}
loadModel()
}, [])
const handleSummarize = async () => {
if (!input.trim() || !summarizer) return
setLoading(true)
try {
const result = await summarizer(input, {
max_length: 100,
min_length: 30,
})
setSummary(result[0].summary_text)
} catch (error) {
console.error('Summarization error:', error)
} finally {
setLoading(false)
}
}
if (modelLoading) {
return <div>Loading model...</div>
}
return (
<div className="summarizer">
<h2>Text Summarizer</h2>
<textarea
value={input}
onChange={(e) => setInput(e.target.value)}
placeholder="Paste text to summarize..."
rows={10}
/>
<button onClick={handleSummarize} disabled={loading}>
{loading ? 'Summarizing...' : 'Summarize'}
</button>
{summary && (
<div className="summary">
<h3>Summary:</h3>
<p>{summary}</p>
</div>
)}
</div>
)
}
export default TextSummarizer
Available Models
Popular models on Hugging Face compatible with Transformers.js:
Text Generation:
Xenova/gpt2Xenova/distilgpt2Xenova/LaMini-Flan-T5-783M
Classification:
Xenova/distilbert-base-uncased-finetuned-sst-2-english(sentiment)Xenova/toxic-bert(toxicity detection)
Summarization:
Xenova/distilbart-cnn-6-6Xenova/distilbart-cnn-12-6
Translation:
Xenova/nllb-200-distilled-600M(200 languages)
Embeddings:
Xenova/all-MiniLM-L6-v2Xenova/all-mpnet-base-v2
Vision:
Xenova/vit-base-patch16-224(image classification)Xenova/clip-vit-base-patch32(image-text matching)
Browse all models: Hugging Face Transformers.js
Performance Optimization
Transformers.js Optimization
// Use WebGPU for faster inference (if available)
import { env } from '@xenova/transformers'
env.backends.onnx.wasm.numThreads = 4 // Use multiple threads
env.backends.onnx.wasm.proxy = false // Disable worker for small models
// Cache models in browser
env.cacheDir = './.cache'
Lazy Loading
// Load model only when needed
let model = null
async function getModel() {
if (!model) {
model = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english')
}
return model
}
// Usage
const classifier = await getModel()
const result = await classifier('Great product!')
Progress Tracking
import { pipeline, env } from '@xenova/transformers'
// Track download progress
env.onProgress = (progress) => {
console.log(`Loading: ${progress.file} - ${progress.progress}%`)
}
const model = await pipeline('text-generation', 'Xenova/gpt2')
Comparison: Browser Inference vs Local Models vs Decentralized Hosting
| Feature | Transformers.js | Ollama.js | Internet Computer |
|---|---|---|---|
| Setup | Pure browser, no backend | Requires Ollama server | Deploy to ICP canisters |
| Model Size | Small-Medium (10MB-500MB) | Large (1GB-7GB+) | Flexible (canister limits) |
| Performance | Moderate (WASM/WebGPU) | Very fast (native) | Distributed (network dependent) |
| Privacy | Fully client-side | Local but requires server | Decentralized (blockchain) |
| Offline | Yes (after first load) | Yes (server must run) | No (requires internet) |
| Censorship Resistance | High | Medium | Very High |
| Cost | Free (client resources) | Free (local hardware) | ICP cycles |
| Best For | Specific tasks, embeddings | Powerful chat/completion | Censorship-resistant apps |
Real-World Examples
// Using Transformers.js for sentiment analysis
import { pipeline } from '@xenova/transformers'
const sentimentAnalyzer = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english')
function JournalEntry({ text }) {
const [sentiment, setSentiment] = useState(null)
useEffect(() => {
async function analyze() {
const result = await sentimentAnalyzer(text)
setSentiment(result[0])
}
if (text) analyze()
}, [text])
return (
<div>
<p>{text}</p>
{sentiment && (
<span className={sentiment.label}>
Mood: {sentiment.label} ({(sentiment.score * 100).toFixed(1)}%)
</span>
)}
</div>
)
}
Code Documentation Generator
// Using Ollama.js for code analysis
import { Ollama } from 'ollama'
async function generateDocs(code) {
const ollama = new Ollama({ host: 'http://localhost:11434' })
const response = await ollama.generate({
model: 'codellama',
prompt: `Generate JSDoc documentation for this code:\n\n${code}`,
stream: false,
})
return response.response
}
// Usage in code editor
const code = `
function fibonacci(n) {
if (n <= 1) return n
return fibonacci(n - 1) + fibonacci(n - 2)
}
`
const docs = await generateDocs(code)
console.log(docs)
Smart Search with Embeddings
// Using Transformers.js for semantic search
import { pipeline } from '@xenova/transformers'
const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2')
// Generate embeddings for documents
const documents = [
'The cat sat on the mat',
'Dogs are great pets',
'Machine learning is fascinating',
]
const docEmbeddings = await Promise.all(
documents.map(doc => extractor(doc, { pooling: 'mean', normalize: true }))
)
// Search function
async function search(query) {
const queryEmbedding = await extractor(query, { pooling: 'mean', normalize: true })
// Cosine similarity
const scores = docEmbeddings.map((docEmb, idx) => {
const similarity = cosineSimilarity(queryEmbedding.data, docEmb.data)
return { document: documents[idx], score: similarity }
})
return scores.sort((a, b) => b.score - a.score)
}
function cosineSimilarity(a, b) {
let dotProduct = 0
let normA = 0
let normB = 0
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i]
normA += a[i] * a[i]
normB += b[i] * b[i]
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB))
}
// Search usage
const results = await search('feline animals')
console.log(results)
// [{ document: 'The cat sat on the mat', score: 0.82 }, ...]
Deployment Considerations
Browser Compatibility
- Ollama.js: Any modern browser with fetch API
- Transformers.js: Chrome 90+, Firefox 88+, Safari 15+
Bundle Size
// Lazy load to reduce initial bundle
const loadTransformers = () => import('@xenova/transformers')
// Use code splitting
const loadOllama = () => import('ollama')
Caching Strategies
// Service Worker for model caching
self.addEventListener('fetch', (event) => {
if (event.request.url.includes('huggingface.co')) {
event.respondWith(
caches.match(event.request).then((response) => {
return response || fetch(event.request).then((response) => {
const clone = response.clone()
caches.open('models').then((cache) => {
cache.put(event.request, clone)
})
return response
})
})
)
}
})
Resources
Ollama.js
Transformers.js
Related
Approach 3: Internet Computer (Blockchain-Based Hosting)
The Internet Computer Protocol (ICP) enables blockchain-based AI applications through canister smart contracts. While the AI inference is currently centralized (run by DFINITY), your application logic executes in a tamper-proof, decentralized environment with transparent, auditable code execution.
What Makes ICP AI Different
Trustworthy Execution:
- Computation validated across 130+ independent data centers globally
- No single point of failure or centralized control
- AI agents execute only what their code allows with transparent, auditable logic
- Fully open-source infrastructure
Unique Capabilities:
- Financial Operations: Securely manage and transact digital assets within AI agents
- Massive Storage: Support for 500GB+ data per canister, ideal for RAG (retrieval-augmented generation) systems
- Multi-language Support: Build with Motoko, Rust, TypeScript, Python, or C++
- DAO Governance: AI agents can be tokenized and governed by decentralized communities
LLM Canister - Deploy AI Agents in Minutes
The official LLM Canister provides simple APIs for integrating large language models into your canisters. Currently supports Llama 3.1 8B and is free during the MVP phase.
Rust Example:
use ic_llm::Model;
// Simple prompt
let response = ic_llm::prompt(
Model::Llama3_1_8B,
"What's the speed of light?"
).await;
// Chat conversation
let messages = vec![
Message::system("You are a helpful assistant"),
Message::user("Explain blockchain in simple terms"),
];
let chat_response = ic_llm::chat(Model::Llama3_1_8B, messages).await;
Motoko Example:
import LLM "mo:llm";
// Simple prompt
let response = await LLM.prompt(
#Llama3_1_8B,
"What's the speed of light?"
);
// Chat conversation
let messages = [
{ role = #system; content = "You are a helpful assistant" },
{ role = #user; content = "Explain blockchain in simple terms" }
];
let chatResponse = await LLM.chat(#Llama3_1_8B, messages);
TypeScript (via Azle 0.27.0+):
import { llm } from 'azle';
// Simple prompt
const response = await llm.prompt(
'Llama3_1_8B',
'What is the speed of light?'
);
// Multi-message chat
const messages = [
{ role: 'system', content: 'You are a helpful assistant' },
{ role: 'user', content: 'Explain blockchain simply' }
];
const chatResponse = await llm.chat('Llama3_1_8B', messages);
How LLM Canister Works
The system uses dedicated "AI workers" - stateless nodes that process LLM requests:
- Your canister sends a prompt to the LLM canister
- The request is queued and routed to available AI workers
- Workers execute the inference and return results
- Your canister receives the generated response
Current Limitations:
- Maximum 10 messages per chat request
- 10KiB prompt size limit
- 200-token output limit
- DFINITY controls the LLM canister and workers (decentralization planned)
Privacy: Prompts are not logged. DFINITY only tracks aggregate usage metrics.
Building a Full LLM Chatbot on ICP
Deploy a complete chatbot with React frontend and Rust backend canister.
Backend Canister (Rust):
use ic_cdk::api::call::call;
use ic_cdk_macros::{query, update};
use ic_llm::{Model, Message};
#[update]
async fn chat(user_message: String, history: Vec<Message>) -> String {
let mut messages = history;
messages.push(Message::user(&user_message));
match ic_llm::chat(Model::Llama3_1_8B, messages).await {
Ok(response) => response,
Err(e) => format!("Error: {:?}", e)
}
}
#[query]
fn get_model_info() -> String {
"Using Llama 3.1 8B on Internet Computer".to_string()
}
Frontend Integration:
import { Actor, HttpAgent } from '@dfinity/agent'
// Connect to your deployed canister
const agent = new HttpAgent({ host: 'https://ic0.app' })
const canisterId = 'your-canister-id'
const actor = Actor.createActor(idlFactory, {
agent,
canisterId
})
// Chat with the AI
async function sendMessage(userMessage, conversationHistory) {
try {
const response = await actor.chat(userMessage, conversationHistory)
return response
} catch (error) {
console.error('Chat error:', error)
throw error
}
}
// React component
function ICPChatbot() {
const [messages, setMessages] = useState([])
const [input, setInput] = useState('')
const [loading, setLoading] = useState(false)
const handleSend = async () => {
if (!input.trim()) return
const userMsg = { role: 'user', content: input }
const newMessages = [...messages, userMsg]
setMessages(newMessages)
setInput('')
setLoading(true)
try {
const response = await sendMessage(input, messages)
setMessages([...newMessages, { role: 'assistant', content: response }])
} catch (error) {
console.error('Error:', error)
} finally {
setLoading(false)
}
}
return (
<div className="chatbot">
<div className="messages">
{messages.map((msg, idx) => (
<div key={idx} className={`message ${msg.role}`}>
<strong>{msg.role}:</strong> {msg.content}
</div>
))}
</div>
<div className="input-area">
<input
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyPress={(e) => e.key === 'Enter' && handleSend()}
disabled={loading}
/>
<button onClick={handleSend} disabled={loading}>
{loading ? 'Sending...' : 'Send'}
</button>
</div>
</div>
)
}
Retrieval-Augmented Generation (RAG) on ICP
Build RAG systems with embeddings stored directly in canisters for semantic search.
Motoko Canister for Embeddings:
import Array "mo:base/Array";
import Float "mo:base/Float";
import Time "mo:base/Time";
actor EmbeddingStore {
type Embedding = {
text: Text;
vector: [Float];
timestamp: Int;
};
stable var embeddings : [Embedding] = [];
stable var secretKey : Text = "your-secret-key";
// Store embedding
public shared func storeEmbedding(
key: Text,
text: Text,
vector: [Float]
) : async Bool {
if (key != secretKey) return false;
let newEmbedding : Embedding = {
text = text;
vector = vector;
timestamp = Time.now();
};
embeddings := Array.append(embeddings, [newEmbedding]);
true
};
// Retrieve all embeddings
public query func getEmbeddings(key: Text) : async ?[Embedding] {
if (key != secretKey) return null;
?embeddings
};
// Cosine similarity search
public query func search(
key: Text,
queryVector: [Float],
topK: Nat
) : async ?[Embedding] {
if (key != secretKey) return null;
// Calculate similarities and return top K results
// Implementation details omitted for brevity
?embeddings
};
}
Node.js Integration Layer:
import express from 'express';
import { HttpAgent, Actor } from '@dfinity/agent';
import { idlFactory } from './embedding-canister.did.js';
const app = express();
app.use(express.json());
const agent = new HttpAgent({ host: 'https://ic0.app' });
const actor = Actor.createActor(idlFactory, {
agent,
canisterId: process.env.CANISTER_ID
});
// Store embedding endpoint
app.post('/embeddings', async (req, res) => {
const { text, embedding } = req.body;
const result = await actor.storeEmbedding(
process.env.SECRET_KEY,
text,
embedding
);
res.json({ success: result });
});
// Retrieve embeddings endpoint
app.get('/embeddings', async (req, res) => {
const embeddings = await actor.getEmbeddings(process.env.SECRET_KEY);
res.json(embeddings);
});
// Semantic search endpoint
app.post('/search', async (req, res) => {
const { query_vector, top_k } = req.body;
const results = await actor.search(
process.env.SECRET_KEY,
query_vector,
top_k || 5
);
res.json(results);
});
app.listen(3000, () => console.log('RAG API running on port 3000'));
Quick Deployment with ICP Ninja
For Rapid Prototyping:
- Visit ICP Ninja and select the LLM Chatbot template
- Click "Deploy" to deploy directly to mainnet - no local setup required
- Get instant access to a working chatbot with Llama 3.1 8B
- Download the project files for local customization
For Local Development:
- Install DFINITY SDK
- Download Ollama and run locally:
ollama servethenollama run llama3.1:8b - Deploy your canister:
dfx deploy
Real-World Use Cases
Ecosystem Applications:
- ELNA.ai: Personal AI memory and knowledge management
- Anda: Conversational AI interface
- ALICE: Autonomous DAO agents
- Kinic: Decentralized search with AI
- Pickpump: DeFi tools with AI assistance
Why ICP for AI?
Versus Traditional Cloud:
- No vendor lock-in or platform censorship
- Transparent, auditable execution
- True data ownership and sovereignty
- Financial operations without intermediaries
Versus Client-Side (Transformers.js):
- Access to larger, more powerful models
- Shared compute resources across users
- Persistent memory and state management
- Integration with blockchain features
Versus Local (Ollama.js):
- No local infrastructure required
- Accessible from any device
- Collaborative AI agents
- Censorship-resistant hosting
Available Libraries & Resources
Official Libraries:
- Rust: ic-llm on docs.rs
- Motoko: mo:llm on mops.one
- TypeScript: Azle 0.27.0+
- Python & C++: Via DFINITY SDK
Documentation & Samples:
- LLM Chatbot Sample (Rust) - Complete chatbot implementation
- Introducing the LLM Canister - Official announcement and guide
- ICP Retrieval System Tutorial - RAG implementation guide
- AI Agents on Internet Computer - Overview and ecosystem
- GitHub: DFINITY LLM Examples - Code samples and templates
Conclusion
Running LLMs in decentralized ways opens up new possibilities for privacy-first, censorship-resistant AI applications. Choose Transformers.js for pure client-side inference, Ollama.js when users can run local servers, and Internet Computer for truly decentralized hosting that cannot be censored or controlled by any single entity.
All three approaches prioritize user privacy while delivering intelligent features. Start experimenting with these tools to build the next generation of decentralized AI applications!