NVIDIA Riva Conformer-CTC Streaming ASR

Production-ready real-time speech transcription using NVIDIA Riva 2.19 with Conformer-CTC streaming model via WebSocket bridge.

https://github.com/davidbmar/nvidia-riva-conformer-streaming  ·  public  ·  shipped

What it is

A deployment automation suite and WebSocket bridge that enables real-time, browser-based speech-to-text using NVIDIA Riva's Conformer-CTC-XL model. It handles the complexity of deploying GPU-accelerated ASR on AWS, managing gRPC connections, and serving a secure HTTPS demo interface for immediate testing.

Features

Quickstart

git clone https://github.com/davidbmar/nvidia-riva-conformer-streaming.git
cd nvidia-riva-conformer-streaming
cp .env.example .env
nano .env
./scripts/010-setup-build-box.sh
aws configure
./scripts/020-deploy-gpu-instance.sh
./scripts/100-deploy-conformer-streaming.sh
./scripts/110-setup-websocket-bridge.sh
./scripts/120-setup-https-demo.sh
echo "Open: https://$(curl -s ifconfig.me):8444"

Architecture

flowchart TD
    Browser[Browser Microphone] -->|WSS Audio Chunks| Bridge[WebSocket Bridge :8443]
    Bridge -->|gRPC Streaming| Riva[NVIDIA Riva Server :50051]
    Riva -->|Conformer-CTC-XL| GPU[GPU Worker Tesla T4]
    Demo[HTTPS Demo UI :8444] -->|Serves HTML/JS| Browser
    subgraph BuildBox [Build Box / Controller]
        Bridge
        Demo
    end
    subgraph AWS [AWS EC2 Instance]
        Riva
        GPU
    end

How it's built

The system uses Shell scripts for infrastructure provisioning (AWS EC2 g4dn instances, NVIDIA drivers, Docker) and Python for the application layer. The core logic consists of a WebSocket-to-gRPC bridge (`riva_websocket_bridge.py`) that translates browser audio chunks into Riva API calls, and a `TranscriptAccumulator` that manages partial/final hypothesis reconciliation to prevent word loss during streaming.

How it runs

sequenceDiagram
    participant User as Browser User
    participant WS as WebSocket Bridge
    participant Riva as Riva gRPC Server
    participant GPU as GPU Model
    
    User->>WS: Connect WSS & Start Stream
    WS->>Riva: Init Streaming Recognize Request
    loop Audio Streaming
        User->>WS: Send Audio Chunk
        WS->>Riva: Stream Audio Data
        Riva->>GPU: Process Conformer-CTC
        GPU-->>Riva: Return Partial/Final Hypothesis
        Riva-->>WS: Stream Response
        WS-->>User: Send JSON Transcript
    end
    User->>WS: Stop Stream
    WS->>Riva: Close Stream

How to apply & reuse

Clone the repository, configure AWS and NGC credentials in `.env`, and run the sequential setup scripts to provision the GPU instance, deploy the Riva model, and start the WebSocket bridge and HTTPS demo server.

At a glance

CapabilitiesStreaming ASRWebSocket ProxyCloud DeploymentGPU AccelerationReal-time Transcription
Componentsriva_websocket_bridge.pyriva_client.pytranscript_accumulator.pysimple_https_server.pydeployment scripts
TechPythonShellNVIDIA RivagRPCWebSocketsAWS EC2DockerConformer-CTC
Depends onUbuntu 20.04/22.04AWS CLINGC API KeyNVIDIA DriversDocker + NVIDIA Container Toolkit
Integrates withAWS EC2NVIDIA NGCWeb BrowsersSystemd
PatternsWebSocket-to-gRPC BridgeInfrastructure as Code (Shell)Producer-Consumer (Audio Stream)Service Wrapper
Reuse tagsspeech-recognitionnvidia-rivaaws-deploymentstreaming-asrwebsocket-bridge

Repo hygiene

✓ all on main — nothing unmerged.