transcriber-2-pass-riva-conformer-cf-s3-lambda-cognito-adapter-2025-10-14

What it is

A production-grade speech recognition system that splits processing into two paths: a low-latency WebSocket bridge to an NVIDIA Riva GPU instance for immediate transcription, and a robust AWS serverless API (Lambda + S3 + Cognito) for storing raw audio chunks, managing sessions, and finalizing recordings. It bridges the gap between real-time user experience and reliable cloud storage.

Features

Real-time browser streaming transcription via NVIDIA Riva Conformer-CTC-XL
Secure WebSocket (WSS) bridge from browser to GPU gRPC endpoint
Serverless audio chunk storage using S3 presigned URLs and Lambda
Amazon Cognito JWT authentication for all API endpoints
Automated GPU instance lifecycle management (start/stop/shutdown)
Session manifest tracking for multi-chunk audio assembly

Quickstart

git clone https://github.com/davidbmar/transcriber-2-pass-riva-conformer-cf-s3-lambda-cognito-adapter-2025-10-14
cd transcriber-2-pass-riva-conformer-cf-s3-lambda-cognito-adapter-2025-10-14
cp .env.example .env
nano .env
./scripts/010-setup-build-box.sh
aws configure
./scripts/020-deploy-gpu-instance.sh
./scripts/100-deploy-conformer-streaming.sh
./scripts/110-setup-websocket-bridge.sh
./scripts/120-setup-https-demo.sh
echo "Open: https://$(curl -s ifconfig.me):8444"

Architecture

flowchart TD
    Browser[Browser Microphone] -->|WSS Audio Chunks| WS_Bridge[WebSocket Bridge :8443]
    Browser -->|HTTPS API| API_GW[AWS API Gateway]
    
    subgraph Build_Box [Build Box / EC2]
        WS_Bridge -->|gRPC Streaming| Riva[RIVA 2.19 Conformer CTC]
        Demo[HTTPS Demo UI :8444] --> Browser
    end

    subgraph AWS_Cloud [AWS Serverless Backend]
        API_GW --> Auth[Cognito Authorizer]
        Auth --> Lambda[Lambda Functions]
        Lambda -->|Presign/Store| S3[(S3 Bucket)]
        Lambda -->|Manifest| S3
    end

    Riva -.->|Transcription Text| Browser

How it's built

The system uses Shell scripts for infrastructure provisioning (EC2 g4dn instances, NVIDIA drivers, Docker). The real-time path uses a Python WebSocket-to-gRPC bridge connecting browsers to Riva. The storage path uses TypeScript AWS Lambda functions behind API Gateway, authenticated via Amazon Cognito JWTs, storing audio chunks in S3 with presigned URLs and maintaining session manifests.

How it runs

sequenceDiagram
    participant User as Browser
    participant API as API Gateway/Lambda
    participant S3 as S3 Storage
    participant WS as WebSocket Bridge
    participant Riva as NVIDIA Riva GPU

    Note over User, Riva: Session Setup & Upload
    User->>API: POST /sessions (Create Session)
    API->>User: Return sessionId & basePrefix
    
    loop For each audio chunk
        User->>API: POST /chunks/presign
        API->>User: Return Presigned PUT URL
        User->>S3: PUT Audio Chunk (Direct)
        User->>API: POST /chunks/complete
        API->>S3: Verify Object Exists
        API->>S3: Update Manifest
    end

    Note over User, Riva: Real-time Transcription
    User->>WS: Connect WSS
    WS->>Riva: Init gRPC Streaming
    loop Streaming Audio
        User->>WS: Send Audio Chunk
        WS->>Riva: Stream Audio Data
        Riva->>WS: Return Partial Transcript
        WS->>User: Push Transcript
    end

    User->>API: POST /sessions/{id}/finalize
    API->>S3: Seal Manifest
    API->>User: Session Finalized

How to apply & reuse

Use this when you need both instant transcription feedback for users and a permanent, searchable archive of the original audio. Ideal for meeting assistants, call center analytics, or medical dictation where latency matters but data integrity and security are paramount.

At a glance

CapabilitiesStreaming ASRAudio ArchivalIdentity ManagementInfrastructure AutomationSecure File Upload

ComponentsNVIDIA Riva ServerWebSocket-gRPC BridgeAWS Lambda APIAmazon S3Amazon CognitoEC2 GPU InstanceBrowser Client

TechPythonTypeScriptShellgRPCWebSocketsAWS CDK/CLIDockerSystemd

Depends onNVIDIA NGC AccountAWS AccountUbuntu 20.04/22.04Node.jsPython 3.8+

Integrates withAmazon CognitoAmazon S3AWS LambdaAPI GatewayNVIDIA Riva Service

PatternsPresigned URL UploadWebSocket ProxyServerless APIInfrastructure as Code (Shell)JWT Authentication

Reuse tagsspeech-recognitionaws-serverlessnvidia-rivareal-time-audiogpu-computing

⚠ Needs attention

unmerged_branch: dependabot/npm_and_yarn/audio-api/npm_and_yarn-30ad79c937 is 1 commit ahead of the default branch
open_pr: PR #1: Bump the npm_and_yarn group across 1 directory with 2 updates