A local, voice-powered AI agent that runs in your browser with Metal-accelerated speech recognition and tool execution.
https://github.com/davidbmar/2026-nano-claw-voice-loop-tts-stt · public · shipped

nano-claw is a personal AI assistant designed for local execution. It creates a continuous voice loop where users speak to their browser, audio is transcribed via Whisper (running natively on Mac for GPU speed), processed by Claude (via API), and spoken back using Kokoro TTS. It supports interactive tool approval, allowing the AI to request permission before executing shell commands or file operations on the host machine.
git clone https://github.com/davidbmar/2026-nano-claw-voice-loop-tts-stt.git cd 2026-nano-claw-voice-loop-tts-stt export ANTHROPIC_API_KEY=sk-ant-... ./run.sh
flowchart TD
User[User Browser] <-->|WebRTC Audio / WebSocket| VS[Voice Server: Python/Docker]
VS <-->|HTTP POST /transcribe| STT[STT Service: faster-whisper/Native Mac]
VS <-->|HTTP API| API[nano-claw API: TypeScript/Docker]
API <-->|LLM Request| Claude[Anthropic Claude API]
API -->|Execute| Tools[Local Tools: Shell/File]
subgraph Docker Container
VS
API
end
subgraph Host Mac
STT
Tools
end
The system uses a hybrid architecture to bypass Docker GPU limitations on macOS. A native Python service runs `faster-whisper` on port 8200 using Apple Metal acceleration. The core logic, including the TypeScript API server and Python-based Voice Server (handling WebRTC and Kokoro TTS), runs inside a Docker container. The browser client communicates via WebSockets for real-time audio streaming and UI updates.
sequenceDiagram
participant U as User
participant B as Browser Client
participant V as Voice Server (Docker)
participant S as STT Service (Native)
participant A as Agent API (Docker)
participant L as Claude API
U->>B: Hold button & Speak
B->>V: Stream Audio (WebRTC)
U->>B: Release button
V->>S: POST audio bytes
S-->>V: Transcribed Text
V->>A: POST text message
A->>L: Send prompt + context
L-->>A: Response + Tool Calls?
alt Tool Call Required
A-->>V: Pending tool request
V-->>B: Show approval card
B->>U: Display approval
U->>B: Approve/Reject
B->>V: Send decision
V->>A: Confirm execution
A->>A: Execute Tool
A->>L: Send tool result
L-->>A: Final response
end
A-->>V: Final text response
V->>V: Generate Speech (Kokoro TTS)
V->>B: Stream Audio Response
B->>U: Play audio
Use this project as a foundation for building private, voice-first AI agents that require low-latency speech-to-text on Apple Silicon. It demonstrates how to bridge native high-performance ML services with containerized application logic and secure tool execution patterns.