NVIDIA Riva Conformer-CTC Streaming ASR

What it is

A deployment automation suite and WebSocket bridge that enables real-time, browser-based speech-to-text using NVIDIA Riva's Conformer-CTC-XL model. It handles the complexity of deploying GPU-accelerated ASR on AWS, managing gRPC connections, and serving a secure HTTPS demo interface for immediate testing.

Features

Real-time browser streaming transcription via microphone input
Conformer-CTC-XL model with 40ms timestep for high accuracy
Secure WebSocket (WSS) bridge connecting browser to Riva gRPC
Automated AWS GPU instance provisioning and lifecycle management
Production-ready systemd services with health checks and auto-restart
Cost-optimized startup/shutdown scripts for overnight GPU savings

Quickstart

git clone https://github.com/davidbmar/nvidia-riva-conformer-streaming.git
cd nvidia-riva-conformer-streaming
cp .env.example .env
nano .env
./scripts/010-setup-build-box.sh
aws configure
./scripts/020-deploy-gpu-instance.sh
./scripts/100-deploy-conformer-streaming.sh
./scripts/110-setup-websocket-bridge.sh
./scripts/120-setup-https-demo.sh
echo "Open: https://$(curl -s ifconfig.me):8444"

Architecture

flowchart TD
    Browser[Browser Microphone] -->|WSS Audio Chunks| Bridge[WebSocket Bridge :8443]
    Bridge -->|gRPC Streaming| Riva[NVIDIA Riva Server :50051]
    Riva -->|Conformer-CTC-XL| GPU[GPU Worker Tesla T4]
    Demo[HTTPS Demo UI :8444] -->|Serves HTML/JS| Browser
    subgraph BuildBox [Build Box / Controller]
        Bridge
        Demo
    end
    subgraph AWS [AWS EC2 Instance]
        Riva
        GPU
    end

How it's built

The system uses Shell scripts for infrastructure provisioning (AWS EC2 g4dn instances, NVIDIA drivers, Docker) and Python for the application layer. The core logic consists of a WebSocket-to-gRPC bridge (`riva_websocket_bridge.py`) that translates browser audio chunks into Riva API calls, and a `TranscriptAccumulator` that manages partial/final hypothesis reconciliation to prevent word loss during streaming.

How it runs

sequenceDiagram
    participant User as Browser User
    participant WS as WebSocket Bridge
    participant Riva as Riva gRPC Server
    participant GPU as GPU Model
    
    User->>WS: Connect WSS & Start Stream
    WS->>Riva: Init Streaming Recognize Request
    loop Audio Streaming
        User->>WS: Send Audio Chunk
        WS->>Riva: Stream Audio Data
        Riva->>GPU: Process Conformer-CTC
        GPU-->>Riva: Return Partial/Final Hypothesis
        Riva-->>WS: Stream Response
        WS-->>User: Send JSON Transcript
    end
    User->>WS: Stop Stream
    WS->>Riva: Close Stream

How to apply & reuse

Clone the repository, configure AWS and NGC credentials in `.env`, and run the sequential setup scripts to provision the GPU instance, deploy the Riva model, and start the WebSocket bridge and HTTPS demo server.

At a glance

CapabilitiesStreaming ASRWebSocket ProxyCloud DeploymentGPU AccelerationReal-time Transcription

Componentsriva_websocket_bridge.pyriva_client.pytranscript_accumulator.pysimple_https_server.pydeployment scripts

TechPythonShellNVIDIA RivagRPCWebSocketsAWS EC2DockerConformer-CTC

Depends onUbuntu 20.04/22.04AWS CLINGC API KeyNVIDIA DriversDocker + NVIDIA Container Toolkit

Integrates withAWS EC2NVIDIA NGCWeb BrowsersSystemd

PatternsWebSocket-to-gRPC BridgeInfrastructure as Code (Shell)Producer-Consumer (Audio Stream)Service Wrapper

Reuse tagsspeech-recognitionnvidia-rivaaws-deploymentstreaming-asrwebsocket-bridge