Production-ready real-time speech transcription using NVIDIA Riva 2.19 with Conformer-CTC streaming model via WebSocket bridge.
https://github.com/davidbmar/nvidia-riva-conformer-streaming · public · shipped
A deployment automation suite and WebSocket bridge that enables real-time, browser-based speech-to-text using NVIDIA Riva's Conformer-CTC-XL model. It handles the complexity of deploying GPU-accelerated ASR on AWS, managing gRPC connections, and serving a secure HTTPS demo interface for immediate testing.
git clone https://github.com/davidbmar/nvidia-riva-conformer-streaming.git cd nvidia-riva-conformer-streaming cp .env.example .env nano .env ./scripts/010-setup-build-box.sh aws configure ./scripts/020-deploy-gpu-instance.sh ./scripts/100-deploy-conformer-streaming.sh ./scripts/110-setup-websocket-bridge.sh ./scripts/120-setup-https-demo.sh echo "Open: https://$(curl -s ifconfig.me):8444"
flowchart TD
Browser[Browser Microphone] -->|WSS Audio Chunks| Bridge[WebSocket Bridge :8443]
Bridge -->|gRPC Streaming| Riva[NVIDIA Riva Server :50051]
Riva -->|Conformer-CTC-XL| GPU[GPU Worker Tesla T4]
Demo[HTTPS Demo UI :8444] -->|Serves HTML/JS| Browser
subgraph BuildBox [Build Box / Controller]
Bridge
Demo
end
subgraph AWS [AWS EC2 Instance]
Riva
GPU
end
The system uses Shell scripts for infrastructure provisioning (AWS EC2 g4dn instances, NVIDIA drivers, Docker) and Python for the application layer. The core logic consists of a WebSocket-to-gRPC bridge (`riva_websocket_bridge.py`) that translates browser audio chunks into Riva API calls, and a `TranscriptAccumulator` that manages partial/final hypothesis reconciliation to prevent word loss during streaming.
sequenceDiagram
participant User as Browser User
participant WS as WebSocket Bridge
participant Riva as Riva gRPC Server
participant GPU as GPU Model
User->>WS: Connect WSS & Start Stream
WS->>Riva: Init Streaming Recognize Request
loop Audio Streaming
User->>WS: Send Audio Chunk
WS->>Riva: Stream Audio Data
Riva->>GPU: Process Conformer-CTC
GPU-->>Riva: Return Partial/Final Hypothesis
Riva-->>WS: Stream Response
WS-->>User: Send JSON Transcript
end
User->>WS: Stop Stream
WS->>Riva: Close Stream
Clone the repository, configure AWS and NGC credentials in `.env`, and run the sequential setup scripts to provision the GPU instance, deploy the Riva model, and start the WebSocket bridge and HTTPS demo server.
✓ all on main — nothing unmerged.