transcriber-2-pass-riva-conformer-cf-s3-lambda-cognito-adapter-2025-10-14

Hybrid architecture combining NVIDIA Riva Conformer-CTC for real-time streaming ASR with an AWS serverless backend for secure, chunked audio storage and session management.

https://github.com/davidbmar/transcriber-2-pass-riva-conformer-cf-s3-lambda-cognito-adapter-2025-10-14  ·  public  ·  shipped

What it is

A production-grade speech recognition system that splits processing into two paths: a low-latency WebSocket bridge to an NVIDIA Riva GPU instance for immediate transcription, and a robust AWS serverless API (Lambda + S3 + Cognito) for storing raw audio chunks, managing sessions, and finalizing recordings. It bridges the gap between real-time user experience and reliable cloud storage.

Features

Quickstart

git clone https://github.com/davidbmar/transcriber-2-pass-riva-conformer-cf-s3-lambda-cognito-adapter-2025-10-14
cd transcriber-2-pass-riva-conformer-cf-s3-lambda-cognito-adapter-2025-10-14
cp .env.example .env
nano .env
./scripts/010-setup-build-box.sh
aws configure
./scripts/020-deploy-gpu-instance.sh
./scripts/100-deploy-conformer-streaming.sh
./scripts/110-setup-websocket-bridge.sh
./scripts/120-setup-https-demo.sh
echo "Open: https://$(curl -s ifconfig.me):8444"

Architecture

flowchart TD
    Browser[Browser Microphone] -->|WSS Audio Chunks| WS_Bridge[WebSocket Bridge :8443]
    Browser -->|HTTPS API| API_GW[AWS API Gateway]
    
    subgraph Build_Box [Build Box / EC2]
        WS_Bridge -->|gRPC Streaming| Riva[RIVA 2.19 Conformer CTC]
        Demo[HTTPS Demo UI :8444] --> Browser
    end

    subgraph AWS_Cloud [AWS Serverless Backend]
        API_GW --> Auth[Cognito Authorizer]
        Auth --> Lambda[Lambda Functions]
        Lambda -->|Presign/Store| S3[(S3 Bucket)]
        Lambda -->|Manifest| S3
    end

    Riva -.->|Transcription Text| Browser

How it's built

The system uses Shell scripts for infrastructure provisioning (EC2 g4dn instances, NVIDIA drivers, Docker). The real-time path uses a Python WebSocket-to-gRPC bridge connecting browsers to Riva. The storage path uses TypeScript AWS Lambda functions behind API Gateway, authenticated via Amazon Cognito JWTs, storing audio chunks in S3 with presigned URLs and maintaining session manifests.

How it runs

sequenceDiagram
    participant User as Browser
    participant API as API Gateway/Lambda
    participant S3 as S3 Storage
    participant WS as WebSocket Bridge
    participant Riva as NVIDIA Riva GPU

    Note over User, Riva: Session Setup & Upload
    User->>API: POST /sessions (Create Session)
    API->>User: Return sessionId & basePrefix
    
    loop For each audio chunk
        User->>API: POST /chunks/presign
        API->>User: Return Presigned PUT URL
        User->>S3: PUT Audio Chunk (Direct)
        User->>API: POST /chunks/complete
        API->>S3: Verify Object Exists
        API->>S3: Update Manifest
    end

    Note over User, Riva: Real-time Transcription
    User->>WS: Connect WSS
    WS->>Riva: Init gRPC Streaming
    loop Streaming Audio
        User->>WS: Send Audio Chunk
        WS->>Riva: Stream Audio Data
        Riva->>WS: Return Partial Transcript
        WS->>User: Push Transcript
    end

    User->>API: POST /sessions/{id}/finalize
    API->>S3: Seal Manifest
    API->>User: Session Finalized

How to apply & reuse

Use this when you need both instant transcription feedback for users and a permanent, searchable archive of the original audio. Ideal for meeting assistants, call center analytics, or medical dictation where latency matters but data integrity and security are paramount.

At a glance

CapabilitiesStreaming ASRAudio ArchivalIdentity ManagementInfrastructure AutomationSecure File Upload
ComponentsNVIDIA Riva ServerWebSocket-gRPC BridgeAWS Lambda APIAmazon S3Amazon CognitoEC2 GPU InstanceBrowser Client
TechPythonTypeScriptShellgRPCWebSocketsAWS CDK/CLIDockerSystemd
Depends onNVIDIA NGC AccountAWS AccountUbuntu 20.04/22.04Node.jsPython 3.8+
Integrates withAmazon CognitoAmazon S3AWS LambdaAPI GatewayNVIDIA Riva Service
PatternsPresigned URL UploadWebSocket ProxyServerless APIInfrastructure as Code (Shell)JWT Authentication
Reuse tagsspeech-recognitionaws-serverlessnvidia-rivareal-time-audiogpu-computing

⚠ Needs attention