sip-voice-transport · davidbmar.com

What it is

A Python library that bridges PSTN telephony providers (Telnyx, Twilio) with local AI pipelines. It handles webhook ingestion, WebSocket media streaming, and automatic audio codec conversion to provide a consistent 16kHz PCM interface for STT/LLM/TTS applications.

Features

Bidirectional audio streaming over WebSocket from Telnyx or Twilio
Automatic codec conversion (mulaw/8kHz to 16kHz PCM) for consistent input
DID-based routing to specific LLM models and system prompts
Mac sleep prevention during active calls using caffeinate
Async iterator interface for seamless integration with AI pipelines

Quickstart

python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[cli,dev]"
sip-voice-server --port 8765
python test_call.py --port 8765

Architecture

flowchart TD
    User[Caller] -->|PSTN| Provider[Telnyx/Twilio]
    Provider -->|Webhook POST| Handler[SipWebhookHandler]
    Handler -->|TeXML/TwiML| Provider
    Provider -->|WebSocket Media| Handler
    Handler -->|Raw Audio| Transport[SipTransport]
    Transport -->|Codec Convert| Codec[AudioCodec]
    Codec -->|16kHz PCM| App[Your AI Pipeline]
    App -->|16kHz PCM| Transport
    Transport -->|Provider Format| Handler
    Handler -->|WebSocket Media| Provider

How it's built

Built on FastAPI for handling HTTP webhooks and WebSocket connections. It uses an abstract base class pattern to support multiple telephony providers, with specific implementations for Telnyx (L16/16kHz) and Twilio (mulaw/8kHz). Audio conversion is handled via NumPy and custom μ-law decoding tables to avoid deprecated dependencies.

How it runs

sequenceDiagram
    participant Caller
    participant Provider as Telnyx/Twilio
    participant Server as SipWebhookHandler
    participant Transport as SipTransport
    participant App as AI Pipeline
    Caller->>Provider: Initiate Call
    Provider->>Server: POST /sip/{provider}/answer
    Server-->>Provider: Return TeXML/TwiML with Stream URL
    Provider->>Server: WebSocket Connect /sip/media-stream
    Server->>Transport: Initialize Session
    loop Active Call
        Provider->>Server: Audio Frame (Provider Format)
        Server->>Transport: Raw Bytes
        Transport->>App: Canonical 16kHz PCM
        App->>Transport: Response 16kHz PCM
        Transport->>Server: Encoded Bytes
        Server->>Provider: Audio Frame (Provider Format)
    end

How to apply & reuse

Integrate by instantiating `SipTransport` and iterating over `receive_audio()` in an async loop. Process each 16kHz PCM chunk through your STT model, generate text responses via an LLM, synthesize speech with TTS, and send the resulting PCM bytes back via `send_audio()`.

At a glance

CapabilitiesInbound call handlingReal-time audio streamingCodec transcodingDID routingSystem sleep inhibition

ComponentsSipTransportSipWebhookHandlerBaseSipProviderAudioCodecDIDRouterSleepInhibitor

TechPython 3.11+FastAPINumPyUvicornWebSockets

Depends onfastapiuvicornnumpypyyamlhttpx

Integrates withTelnyxTwilioWhisper (STT)Ollama (LLM)Piper (TTS)

PatternsAbstract Base ClassAsync IteratorStrategy Pattern (Providers)Adapter Pattern (Codecs)

Reuse tagstelephonyvoipvoice-aireal-time-audiomacos