browser-llm-local-ai-chat · davidbmar.com

What it is

A lightweight, privacy-focused AI chat client that runs entirely in the browser without a build step. It supports multiple inference backends including local Ollama instances, cloud providers (OpenAI/Anthropic), and client-side models via WebLLM or Wllama. The application features a robust tool-calling framework, file analysis capabilities for CSV/TSV data, and conversation export options, all managed through a responsive UI with persistent local settings.

Features

Multi-backend support: Ollama, OpenAI, Anthropic, and in-browser WebGPU/WASM
Extensible tool-calling framework with approval controls and prompt-based fallbacks
File attachment and analysis for CSV/TSV with automatic statistical detection
Real-time token streaming with performance metrics and timing stats
Privacy mode restricting connections to local-only backends
Conversation export to PDF, RTF, Markdown, and HTML

Quickstart

git clone https://github.com/davidbmar/browser-llm-local-ai-chat.git
cd browser-llm-local-ai-chat
python3 -m http.server 8000

Architecture

flowchart TD
    User[User Browser] -->|Interacts| UI[index.html / js/app.js]
    UI -->|Manages State| Settings[lib/inference/settings-store.js]
    UI -->|Requests Inference| Client[lib/inference/inference-client.js]
    Client -->|Routes To| Registry[lib/inference/provider-registry.js]
    Registry -->|Adapter| Ollama[providers/ollama.js]
    Registry -->|Adapter| OpenAI[providers/openai.js]
    Registry -->|Adapter| Anthropic[providers/anthropic.js]
    Registry -->|Adapter| InBrowser[providers/in-browser.js]
    InBrowser -->|WebGPU| WebLLM[engines/webllm]
    InBrowser -->|WASM| Wllama[engines/wllama]
    Ollama -->|HTTP| LocalOllama[Local Ollama Instance]
    OpenAI -->|HTTPS| CloudOpenAI[OpenAI API]
    Anthropic -->|HTTPS| CloudAnthropic[Anthropic API]
    UI -->|Processes| FileProc[lib/file-processing/*]
    UI -->|Executes| Tools[js/tool-calling.js]

How it's built

Built with vanilla JavaScript ES modules, requiring no framework or bundler for the core web app. It uses an adapter pattern via a ProviderRegistry to normalize interactions with different LLM APIs. Inference is handled by async generators for streaming tokens. The desktop version wraps the web app in Electron, managing a bundled Ollama binary via child processes. State is persisted in localStorage, and security is maintained through context isolation in Electron and XSS-safe DOM rendering.

How it runs

sequenceDiagram
    participant U as User
    participant App as js/app.js
    participant Gen as js/generate.js
    participant Client as lib/inference/inference-client.js
    participant Prov as Provider Adapter
    participant Model as LLM Backend

    U->>App: Click Send Message
    App->>Gen: generate(messages, options)
    Gen->>Client: streamResponse(providerId, payload)
    Client->>Prov: createCompletion(payload)
    Prov->>Model: POST /chat/completions
    loop Streaming Tokens
        Model-->>Prov: Token Chunk
        Prov-->>Client: Yield Chunk
        Client-->>Gen: Yield Chunk
        Gen->>App: Update UI with Token
    end
    Gen->>App: Finalize Response
    App->>U: Display Full Message

How to apply & reuse

Clone the repository and serve the static files using a simple HTTP server (e.g., Python's http.server). Connect to a local Ollama instance (ensuring CORS is enabled) or select the in-browser backend to run models directly on the client's GPU/CPU. For desktop usage, install dependencies and run via Electron, which bundles the Ollama binary for a seamless offline experience.

At a glance

CapabilitiesMulti-provider LLM inferenceClient-side model executionTool use and function callingTabular file analysisConversation persistenceDesktop app bundling

Componentsjs/app.jsjs/generate.jsjs/chat.jsjs/tool-calling.jslib/inference/provider-registry.jslib/inference/inference-client.jsdesktop/main.jsdesktop/ollama-manager.js

TechJavaScriptES ModulesWebGPUWebAssemblyElectronHTML5CSS3

Depends onOllama (optional)Python 3 (for local serving)Node.js (for Electron build)

Integrates withOllama APIOpenAI APIAnthropic APIWebLLMWllama

PatternsAdapter PatternObserver Pattern (Event Listeners)Singleton (Settings Store)Async Generator (Streaming)Context Isolation (Electron)

Reuse tagsai-chatlocal-llmwebgpuvanilla-jselectron-appprivacy-focused