A zero-network, WebGPU-powered conversational AI featuring an embedder-based intent classifier and a visual workflow editor for stateful dialogue.
https://github.com/davidbmar/speaker-generation-version-2-intents · private · shipped

Iris Kade is a privacy-first, browser-native AI assistant that runs entirely on client-side hardware using WebGPU. It combines local LLM inference (via web-llm) with a multi-lane RAG pipeline and a custom declarative workflow engine. The system uses vector embeddings to classify user intents and route conversations through defined state machines, enabling complex, context-aware interactions without sending data to external servers.
git clone https://github.com/davidbmar/speaker-generation-version-2-intents cd speaker-generation-version-2-intents/web-app npm install npm run dev
flowchart TD
User[User Input] -->|Voice/Text| STT[Web Speech API]
STT --> FSM[Conversation FSM]
FSM -->|IDLE to PROCESSING| Pipeline[RAG Pipeline]
Pipeline -->|Retrieve & Rerank| Context[Context Composer]
Context --> LLM[Local LLM via web-llm]
LLM -->|Stream Tokens| Buffer[Sentence Buffer]
Buffer --> TTS[vits-web TTS]
TTS --> Audio[Audio Output]
FSM -->|Interrupt| Pipeline
FSM -->|Reset| IDLE((IDLE State))
subgraph Local Browser
STT
FSM
Pipeline
LLM
TTS
end
The application is built with TypeScript and runs in the browser. It leverages web-llm for local LLM inference, vits-web for streaming text-to-speech, and the Web Speech API for speech-to-text. A custom workflow engine parses JSONL definitions into interactive state graphs. Vector search is handled via Web Workers using ONNX embeddings, while the UI manages a 4-state conversation FSM (IDLE, PROCESSING, SPEAKING, INTERRUPTED).
sequenceDiagram
participant U as User
participant UI as UI/FSM
participant W as Workflow/Intent
participant RAG as RAG Pipeline
participant LLM as WebLLM
participant TTS as VITS Web
U->>UI: Speak/Type Input
UI->>W: Classify Intent
W->>W: Match State/Transition
W->>RAG: Retrieve Context
RAG->>RAG: Vector + Lexical Rerank
RAG->>LLM: Compose Prompt
LLM->>LLM: Generate Tokens
LLM->>TTS: Stream Sentences
TTS->>U: Play Audio
U->>UI: Interrupt (Optional)
UI->>LLM: Abort Generation
UI->>TTS: Stop Audio
UI->>W: Reset to IDLE
Use this project as a foundation for building private, offline-capable AI assistants or interactive narrative engines. The modular workflow system allows developers to define complex conversation logic declaratively, while the RAG pipeline can be repurposed for domain-specific knowledge retrieval. It serves as a reference implementation for high-performance WebGPU AI applications.