CloudDrive: Real-Time Transcription Platform

What it is

CloudDrive is a full-stack web application that provides real-time audio transcription via WebSocket-connected GPU instances running WhisperLive. It features a vanilla JavaScript frontend hosted on S3/CloudFront, authenticated by AWS Cognito, with batch processing capabilities via Node.js Lambda functions. The system supports offline recording with IndexedDB retry queues and includes a transcript editor with word-level highlighting.

Features

Real-time transcription via WhisperLive over WebSockets
Batch transcription for uploaded audio files
Secure authentication and user management via AWS Cognito
Offline-first recording with IndexedDB and automatic retry
Interactive transcript editor with word-level highlighting
Automated infrastructure deployment via Bash scripts

Quickstart

./scripts/005-setup-configuration.sh
./scripts/010-setup-edge-box.sh
./scripts/305-setup-whisperlive-edge.sh
./scripts/020-deploy-gpu-instance.sh
./scripts/310-configure-whisperlive-gpu.sh
./scripts/030-configure-gpu-security.sh
./scripts/031-configure-edge-box-security.sh
./scripts/420-deploy-cognito-stack.sh
./scripts/425-deploy-recorder-ui.sh
./scripts/430-create-cognito-user.sh

Architecture

flowchart TD
    User[Browser Client] -->|WSS| Edge[Caddy Edge Box]
    User -->|HTTPS| CF[CloudFront CDN]
    Edge -->|TCP:9090| GPU[GPU Instance: WhisperLive]
    CF -->|Static Assets| S3[S3 Bucket]
    CF -->|API Requests| AGW[API Gateway]
    AGW -->|Auth| Cognito[AWS Cognito]
    AGW -->|Invoke| Lambda[AWS Lambda Node.js]
    Lambda -->|Read/Write| S3
    Lambda -->|Trigger| Batch[Batch Transcription]

How it's built

The frontend is built with vanilla HTML/JS, using the MediaRecorder API and WebSockets. The backend uses the Serverless Framework to deploy Node.js 18.x Lambda functions behind API Gateway. Infrastructure includes an Edge Box (Caddy reverse proxy) and a GPU instance (g4dn.xlarge) for WhisperLive. Deployment is automated via Bash scripts handling AWS CLI, CloudFormation, and instance configuration.

How it runs

sequenceDiagram
    participant Browser
    participant Caddy as Edge Box (Caddy)
    participant Whisper as WhisperLive (GPU)
    participant S3
    participant Lambda
    
    Browser->>Caddy: WebSocket Connect /transcribe
    Caddy->>Whisper: Forward Audio Stream
    Whisper-->>Caddy: Real-time Text Chunks
    Caddy-->>Browser: Return Transcribed Text
    
    Note over Browser,S3: Session End / Batch Upload
    Browser->>S3: Upload Audio File (via Presigned URL)
    S3-->>Browser: Confirm Upload
    Browser->>Lambda: Trigger Batch Transcription
    Lambda->>Whisper: Submit Audio for Processing
    Whisper-->>Lambda: Final Transcript
    Lambda->>S3: Store Transcript JSON

How to apply & reuse

Use this project as a reference architecture for building low-latency AI-powered media services on AWS. It demonstrates how to bridge browser-based media capture with high-performance GPU inference endpoints while maintaining secure, serverless storage and authentication patterns. The template-based UI deployment strategy is also reusable for static sites requiring environment-specific configuration injection.

At a glance

CapabilitiesReal-time Speech-to-TextBatch Audio ProcessingOffline Data PersistenceSecure User AuthenticationCloud Storage IntegrationAutomated Infrastructure Provisioning

ComponentsVanilla JS FrontendCaddy Reverse ProxyWhisperLive Inference EngineAWS Lambda FunctionsServerless Framework ConfigBash Deployment Scripts

TechHTML5JavaScriptNode.js 18.xAWS LambdaAmazon S3Amazon CloudFrontAmazon CognitoWhisperLiveFaster-WhisperCaddyServerless FrameworkPlaywright

Depends onAWS CLINode.js 18+Bash ShellSSH Key PairGPU Instance (g4dn.xlarge)

Integrates withAWS CognitoAmazon S3Amazon CloudFrontGoogle Docs (via API)IndexedDB

PatternsServerless BackendEdge ComputingWebSocket StreamingOffline-First ArchitectureInfrastructure as Code (Scripts)Template-Based UI Deployment

Reuse tagsreal-time-transcriptionaws-serverlesswhisper-livecognito-authgpu-inferencevanilla-jssaas-template

⚠ Needs attention

unmerged_branch: dependabot/npm_and_yarn/cognito-stack/npm_and_yarn-e9ce4f7be9 is 1 commit ahead of the default branch
open_pr: PR #1: chore(deps): Bump uuid from 9.0.1 to 14.0.0 in /cognito-stack in the npm_and_yarn group across 1 directory