Composable Python infrastructure for voice AI apps: WebRTC transport, edge authentication, and pluggable STT/TTS/LLM engines.
https://github.com/davidbmar/voice-frontend-modules-auth.transport.engine · public · shipped
A modular toolkit for building real-time voice applications. It decouples connectivity (WebRTC/TURN), security (JWT/Cloudflare Access), and AI processing (STT/TTS/LLM) into three independent packages. Developers can mix and match components or swap the reference 'engine-starter' with production-grade providers while maintaining a consistent interface.
pip install voice-frontend[all] uvicorn examples.minimal-voice-app.server:app --port 8090
flowchart TD
Client[Browser/JS Client] -->|WebSocket| Signaling[Signaling Server]
Client -->|WebRTC Media| Session[WebRTC Session]
Signaling -->|Auth Check| Auth[Edge Auth Middleware]
Auth -->|Validate| Providers[Auth Providers]
Session -->|Audio In| STT[STT Provider]
STT -->|Text| LLM[LLM Provider]
LLM -->|Text Response| TTS[TTS Provider]
TTS -->|Audio Out| Session
Session -->|Media Stream| Client
subgraph Infrastructure
Signaling
Auth
Session
end
subgraph AI Engine
STT
LLM
TTS
end
Python 3.9+ backend using FastAPI for signaling and WebSocket handling. The transport layer manages WebRTC sessions, ICE gathering, and TURN credentials. Edge-auth provides middleware for HTTP and WebSocket authentication. The engine layer defines Abstract Base Classes (ABCs) for speech-to-text, text-to-speech, and LLM interactions, implemented by reference starters (Whisper, Piper, Ollama) that can be replaced by custom implementations.
sequenceDiagram
participant Browser as Browser Client
participant FastAPI as FastAPI App
participant Auth as Edge Auth
participant Signal as Signaling Server
participant Session as WebRTC Session
participant Engine as AI Engine (STT/LLM/TTS)
Browser->>FastAPI: GET / (Load JS Client)
FastAPI-->>Browser: HTML/JS
Browser->>FastAPI: WebSocket Connect /ws
FastAPI->>Auth: authenticate_ws()
Auth-->>FastAPI: AuthResult
alt Authorized
FastAPI->>Signal: handle(websocket)
Signal->>Session: Create WebRTC Session
Session->>Browser: SDP Offer/Answer
Browser->>Session: Media Stream
loop Voice Interaction
Session->>Engine: listen(stt)
Engine-->>Session: Utterance Text
Session->>Engine: llm.chat(utterance)
Engine-->>Session: Response Text
Session->>Engine: speak(response, tts)
Engine-->>Session: Audio Bytes
Session->>Browser: Send Audio Track
end
else Unauthorized
FastAPI->>Browser: Close Connection (4001)
end
Install the required packages via pip. Initialize a FastAPI app and mount the signaling server. Implement your conversation logic in an async handler that receives a WebRTCSession object. Use the session's speak() and listen() methods with your chosen STT/TTS providers. Add auth middleware if securing the endpoint.
✓ all on main — nothing unmerged.