PSTN phone call transport for Mac-local voice AI supporting Telnyx and Twilio.
https://github.com/davidbmar/sip-voice-transport · public · shipped
A Python library that bridges PSTN telephony providers (Telnyx, Twilio) with local AI pipelines. It handles webhook ingestion, WebSocket media streaming, and automatic audio codec conversion to provide a consistent 16kHz PCM interface for STT/LLM/TTS applications.
python3 -m venv .venv source .venv/bin/activate pip install -e ".[cli,dev]" sip-voice-server --port 8765 python test_call.py --port 8765
flowchart TD
User[Caller] -->|PSTN| Provider[Telnyx/Twilio]
Provider -->|Webhook POST| Handler[SipWebhookHandler]
Handler -->|TeXML/TwiML| Provider
Provider -->|WebSocket Media| Handler
Handler -->|Raw Audio| Transport[SipTransport]
Transport -->|Codec Convert| Codec[AudioCodec]
Codec -->|16kHz PCM| App[Your AI Pipeline]
App -->|16kHz PCM| Transport
Transport -->|Provider Format| Handler
Handler -->|WebSocket Media| Provider
Built on FastAPI for handling HTTP webhooks and WebSocket connections. It uses an abstract base class pattern to support multiple telephony providers, with specific implementations for Telnyx (L16/16kHz) and Twilio (mulaw/8kHz). Audio conversion is handled via NumPy and custom μ-law decoding tables to avoid deprecated dependencies.
sequenceDiagram
participant Caller
participant Provider as Telnyx/Twilio
participant Server as SipWebhookHandler
participant Transport as SipTransport
participant App as AI Pipeline
Caller->>Provider: Initiate Call
Provider->>Server: POST /sip/{provider}/answer
Server-->>Provider: Return TeXML/TwiML with Stream URL
Provider->>Server: WebSocket Connect /sip/media-stream
Server->>Transport: Initialize Session
loop Active Call
Provider->>Server: Audio Frame (Provider Format)
Server->>Transport: Raw Bytes
Transport->>App: Canonical 16kHz PCM
App->>Transport: Response 16kHz PCM
Transport->>Server: Encoded Bytes
Server->>Provider: Audio Frame (Provider Format)
end
Integrate by instantiating `SipTransport` and iterating over `receive_audio()` in an async loop. Process each 16kHz PCM chunk through your STT model, generate text responses via an LLM, synthesize speech with TTS, and send the resulting PCM bytes back via `send_audio()`.
✓ all on main — nothing unmerged.