voice-frontend-modules

Composable Python infrastructure for voice AI apps: WebRTC transport, edge authentication, and pluggable STT/TTS/LLM engines.

https://github.com/davidbmar/voice-frontend-modules-auth.transport.engine  ·  public  ·  shipped

What it is

A modular toolkit for building real-time voice applications. It decouples connectivity (WebRTC/TURN), security (JWT/Cloudflare Access), and AI processing (STT/TTS/LLM) into three independent packages. Developers can mix and match components or swap the reference 'engine-starter' with production-grade providers while maintaining a consistent interface.

Features

Quickstart

pip install voice-frontend[all]
uvicorn examples.minimal-voice-app.server:app --port 8090

Architecture

flowchart TD
    Client[Browser/JS Client] -->|WebSocket| Signaling[Signaling Server]
    Client -->|WebRTC Media| Session[WebRTC Session]
    Signaling -->|Auth Check| Auth[Edge Auth Middleware]
    Auth -->|Validate| Providers[Auth Providers]
    Session -->|Audio In| STT[STT Provider]
    STT -->|Text| LLM[LLM Provider]
    LLM -->|Text Response| TTS[TTS Provider]
    TTS -->|Audio Out| Session
    Session -->|Media Stream| Client
    subgraph Infrastructure
        Signaling
        Auth
        Session
    end
    subgraph AI Engine
        STT
        LLM
        TTS
    end

How it's built

Python 3.9+ backend using FastAPI for signaling and WebSocket handling. The transport layer manages WebRTC sessions, ICE gathering, and TURN credentials. Edge-auth provides middleware for HTTP and WebSocket authentication. The engine layer defines Abstract Base Classes (ABCs) for speech-to-text, text-to-speech, and LLM interactions, implemented by reference starters (Whisper, Piper, Ollama) that can be replaced by custom implementations.

How it runs

sequenceDiagram
    participant Browser as Browser Client
    participant FastAPI as FastAPI App
    participant Auth as Edge Auth
    participant Signal as Signaling Server
    participant Session as WebRTC Session
    participant Engine as AI Engine (STT/LLM/TTS)

    Browser->>FastAPI: GET / (Load JS Client)
    FastAPI-->>Browser: HTML/JS
    Browser->>FastAPI: WebSocket Connect /ws
    FastAPI->>Auth: authenticate_ws()
    Auth-->>FastAPI: AuthResult
    alt Authorized
        FastAPI->>Signal: handle(websocket)
        Signal->>Session: Create WebRTC Session
        Session->>Browser: SDP Offer/Answer
        Browser->>Session: Media Stream
        loop Voice Interaction
            Session->>Engine: listen(stt)
            Engine-->>Session: Utterance Text
            Session->>Engine: llm.chat(utterance)
            Engine-->>Session: Response Text
            Session->>Engine: speak(response, tts)
            Engine-->>Session: Audio Bytes
            Session->>Browser: Send Audio Track
        end
    else Unauthorized
        FastAPI->>Browser: Close Connection (4001)
    end

How to apply & reuse

Install the required packages via pip. Initialize a FastAPI app and mount the signaling server. Implement your conversation logic in an async handler that receives a WebRTCSession object. Use the session's speak() and listen() methods with your chosen STT/TTS providers. Add auth middleware if securing the endpoint.

At a glance

CapabilitiesWebRTC SignalingTURN Credential ManagementWebSocket AuthenticationSpeech-to-Text IntegrationText-to-Speech SynthesisLLM Chat OrchestrationVoice Activity DetectionInterruptible Playback
Componentstransportedge-authengine-starterSignalingServerWebRTCSessionAuthProviderCompositeProviderStarterSTTStarterTTSStarterLLM
TechPythonFastAPIWebRTCWebSocketWhisperPiperOllamaJavaScript
Depends onfastapiuvicornwebsocketsaiortctwiliokokoro-onnxopenai-whisper
Integrates withCloudflare AccessGoogle JWTTwilio TURNCustom STT ProvidersCustom TTS ProvidersCustom LLM Providers
PatternsDependency InjectionAdapter PatternMiddleware PatternAsync/AwaitAbstract Base Classes
Reuse tagsvoice-aiwebrtcauth-middlewarestt-ttsfastapireal-time

Repo hygiene

✓ all on main — nothing unmerged.