A minimal, self-contained voice conversation system connecting browser microphone to Claude via WebRTC, using local Whisper for STT and Piper for TTS.
https://github.com/davidbmar/voice-only-UI-STT-TTS-base · public · shipped

A real-time voice interface that runs entirely in the browser and a local Python server. It captures audio via WebRTC, performs Voice Activity Detection (VAD), transcribes speech locally using faster-whisper, sends text to Anthropic's Claude API, synthesizes response audio locally using Piper TTS, and streams it back to the browser. It supports barge-in (interrupting the assistant) and runtime configuration via an admin panel.
git clone https://github.com/davidbmar/voice-only-UI-STT-TTS-base.git cd voice-only-UI-STT-TTS-base python -m venv .venv source .venv/bin/activate pip install -r requirements.txt cp .env.example .env # Edit .env to set ANTHROPIC_API_KEY python server.py
flowchart TD
subgraph Client [Browser]
Mic[Microphone]
WebRTC_Client[WebRTC Peer]
Speaker[Speaker]
end
subgraph Server [Python Server]
Signaling[FastAPI / WebSocket]
Peer[aiortc PeerConnection]
VAD[Voice Activity Detector]
STT[faster-whisper]
LLM[Claude API]
TTS[Piper TTS]
AudioQueue[Audio Queue]
end
Mic -->|Audio Frames| WebRTC_Client
WebRTC_Client -->|Opus Audio| Peer
Peer -->|PCM Audio| VAD
VAD -->|Speech Detected| STT
STT -->|Transcript| LLM
LLM -->|Response Text| TTS
TTS -->|PCM Audio| AudioQueue
AudioQueue -->|Opus Audio| Peer
Peer -->|Audio Stream| WebRTC_Client
WebRTC_Client --> Speaker
Signaling <-->|SDP Offer/Answer| WebRTC_Client
Signaling -->|Controls| VAD
Signaling -->|Controls| STT
Signaling -->|Controls| TTS
Built with Python 3.11+ using FastAPI for the web server and aiortc for WebRTC handling. Speech-to-Text uses faster-whisper (CPU-based), Text-to-Speech uses Piper ONNX models, and the LLM integration targets Anthropic's Claude. The frontend is a simple HTML/JS client handling getUserMedia and WebRTC peer connection.
sequenceDiagram
participant Browser
participant Server
participant Whisper as faster-whisper
participant Claude as Claude API
participant Piper as Piper TTS
Browser->>Server: HTTP GET /
Server-->>Browser: Serve HTML/JS UI
Browser->>Server: WebSocket Connect /ws
Browser->>Server: WebRTC Offer (SDP)
Server-->>Browser: WebRTC Answer (SDP)
Browser->>Server: Audio Frames (Opus)
Server->>Server: VAD Detection
alt Speech Detected
Server->>Whisper: Transcribe Audio
Whisper-->>Server: Text Transcript
Server->>Claude: Send Prompt
Claude-->>Server: Response Text
Server->>Piper: Synthesize Speech
Piper-->>Server: Audio Chunks
loop Streaming
Server->>Browser: Audio Frames (Opus)
Browser->>Server: Barge-in Audio (if interrupting)
end
end
Use as a foundational template for building voice-enabled AI assistants without complex state machines. Ideal for developers needing a low-latency, privacy-conscious (local STT/TTS) voice interface prototype that can be extended with custom logic or different LLM providers.
✓ all on main — nothing unmerged.