A zero-build, vanilla JS browser chat interface supporting Ollama, OpenAI, Anthropic, and in-browser WebGPU/WASM inference.
https://github.com/davidbmar/browser-llm-local-ai-chat · public · shipped
A lightweight, privacy-focused AI chat client that runs entirely in the browser without a build step. It supports multiple inference backends including local Ollama instances, cloud providers (OpenAI/Anthropic), and client-side models via WebLLM or Wllama. The application features a robust tool-calling framework, file analysis capabilities for CSV/TSV data, and conversation export options, all managed through a responsive UI with persistent local settings.
git clone https://github.com/davidbmar/browser-llm-local-ai-chat.git cd browser-llm-local-ai-chat python3 -m http.server 8000
flowchart TD
User[User Browser] -->|Interacts| UI[index.html / js/app.js]
UI -->|Manages State| Settings[lib/inference/settings-store.js]
UI -->|Requests Inference| Client[lib/inference/inference-client.js]
Client -->|Routes To| Registry[lib/inference/provider-registry.js]
Registry -->|Adapter| Ollama[providers/ollama.js]
Registry -->|Adapter| OpenAI[providers/openai.js]
Registry -->|Adapter| Anthropic[providers/anthropic.js]
Registry -->|Adapter| InBrowser[providers/in-browser.js]
InBrowser -->|WebGPU| WebLLM[engines/webllm]
InBrowser -->|WASM| Wllama[engines/wllama]
Ollama -->|HTTP| LocalOllama[Local Ollama Instance]
OpenAI -->|HTTPS| CloudOpenAI[OpenAI API]
Anthropic -->|HTTPS| CloudAnthropic[Anthropic API]
UI -->|Processes| FileProc[lib/file-processing/*]
UI -->|Executes| Tools[js/tool-calling.js]
Built with vanilla JavaScript ES modules, requiring no framework or bundler for the core web app. It uses an adapter pattern via a ProviderRegistry to normalize interactions with different LLM APIs. Inference is handled by async generators for streaming tokens. The desktop version wraps the web app in Electron, managing a bundled Ollama binary via child processes. State is persisted in localStorage, and security is maintained through context isolation in Electron and XSS-safe DOM rendering.
sequenceDiagram
participant U as User
participant App as js/app.js
participant Gen as js/generate.js
participant Client as lib/inference/inference-client.js
participant Prov as Provider Adapter
participant Model as LLM Backend
U->>App: Click Send Message
App->>Gen: generate(messages, options)
Gen->>Client: streamResponse(providerId, payload)
Client->>Prov: createCompletion(payload)
Prov->>Model: POST /chat/completions
loop Streaming Tokens
Model-->>Prov: Token Chunk
Prov-->>Client: Yield Chunk
Client-->>Gen: Yield Chunk
Gen->>App: Update UI with Token
end
Gen->>App: Finalize Response
App->>U: Display Full Message
Clone the repository and serve the static files using a simple HTTP server (e.g., Python's http.server). Connect to a local Ollama instance (ensuring CORS is enabled) or select the in-browser backend to run models directly on the client's GPU/CPU. For desktop usage, install dependencies and run via Electron, which bundles the Ollama binary for a seamless offline experience.
✓ all on main — nothing unmerged.