A low-latency, GPU-accelerated speech-to-text API server using NVIDIA Parakeet RNN-T and Riva.
https://github.com/davidbmar/nvidia-RNN-T-Parakeet · public · shipped
This project provides a production-ready FastAPI wrapper around NVIDIA's Parakeet RNN-T model, served via the NVIDIA Riva ASR framework. It enables ultra-low latency (~100ms) real-time transcription with streaming WebSocket support, word-level timestamps, and optional AWS integrations for audio ingestion and event handling.
./scripts/step-005-setup-parakeet-environment.sh ./scripts/step-010-install-parakeet-dependencies.sh ./scripts/step-015-download-parakeet-model.sh ./scripts/step-020-test-parakeet-inference.sh ./scripts/step-025-setup-gpu-environment.sh
flowchart TD
Client[Client App] -->|WebSocket/HTTP| API[FastAPI Server]
API -->|gRPC| Riva[NVIDIA Riva Service]
Riva -->|TensorRT| GPU[NVIDIA GPU]
GPU -->|Inference Result| Riva
Riva -->|Transcription| API
API -->|Store/Event| AWS[AWS S3 / EventBridge]
subgraph Docker Container
API
Riva
end
The system is built as a Dockerized Python application. It uses a series of numbered Shell scripts to handle environment setup, dependency installation, model downloading from NGC, and GPU configuration. The core service is a FastAPI server that interfaces with the Riva client SDK to perform inference on the Parakeet 1.1B English model.
sequenceDiagram
participant C as Client
participant F as FastAPI Server
participant R as Riva Service
participant G as GPU
C->>F: POST /transcribe or WS Connect
F->>R: StreamingRecognize Request (audio chunks)
R->>G: Execute Parakeet Model Inference
G-->>R: Return Logits/Text
R-->>F: Partial/Final Transcription Results
F-->>C: JSON Response with Text & Timestamps
Deploy this system where low-latency transcription is critical, such as live captioning, voice assistants, or real-time meeting notes. It requires an NVIDIA GPU with CUDA support and an NGC API key. It can be integrated into larger pipelines via its REST API or WebSocket endpoints, with optional hooks for AWS S3 and EventBridge.
✓ all on main — nothing unmerged.