audio-ui-realtime-transcribe

What it is

A web application that records audio in the browser, chunks it into configurable intervals (5s-5min), and uploads it to AWS S3 via pre-signed URLs. It provides a mobile-optimized UI for managing these sessions, playing back .webm files, and organizing recordings by human-readable timestamps. The backend is entirely serverless, using AWS Lambda for API logic and Cognito for user isolation.

Features

Browser-based chunked audio recording with configurable durations
Serverless file management with session-based organization
Mobile-optimized UI with iOS-style action sheets
Secure user-isolated storage via AWS Cognito and S3
Direct .webm audio playback with modal interface
Pre-signed URL upload mechanism for secure direct-to-S3 transfer

Quickstart

git clone https://github.com/davidbmar/audio-ui-realtime-transcribe.git
cd audio-ui-realtime-transcribe
chmod +x step-*.sh
./step-10-setup.sh
./step-20-deploy-lambda.sh
./step-25-update-web-files.sh
./step-45-validation.sh

Architecture

flowchart TD
    User[User Browser] -->|HTTPS| CF[CloudFront Distribution]
    CF -->|Static Assets| S3Web[S3 Web Bucket]
    User -->|API Requests| AG[API Gateway]
    AG -->|Invoke| Lambda[Lambda Functions]
    Lambda -->|Auth Check| Cognito[Cognito User Pool]
    Lambda -->|Generate Pre-signed URL| S3Audio[S3 Audio Bucket]
    User -->|Direct Upload Chunk| S3Audio
    Lambda -->|Read/Write Metadata| S3Audio

How it's built

The frontend uses vanilla HTML/CSS/JS (with React templates) for the recorder and file manager. The backend consists of Node.js Lambda functions behind API Gateway. Audio chunks are uploaded directly to S3 using pre-signed URLs generated by Lambda. Session metadata and file listings are managed via S3 object operations. Authentication is handled by Amazon Cognito.

How it runs

sequenceDiagram
    participant U as User Browser
    participant AG as API Gateway
    participant L as Lambda (audio.js)
    participant S3 as S3 Bucket
    
    U->>AG: POST /upload-chunk (sessionId, chunkNumber)
    AG->>L: Invoke uploadChunk
    L->>L: Validate User Claims (Cognito)
    L->>S3: Generate Pre-signed PUT URL
    S3-->>L: Return Signed URL
    L-->>AG: 200 OK { uploadUrl }
    AG-->>U: Return Signed URL
    
    U->>S3: PUT Audio Chunk (Binary)
    S3-->>U: 200 OK
    
    U->>AG: GET /sessions
    AG->>L: Invoke listSessions
    L->>S3: List Objects (users/{userId}/audio/sessions/)
    S3-->>L: Object Keys
    L-->>AG: 200 OK { sessions[] }
    AG-->>U: Return Session List

How to apply & reuse

Deploy the infrastructure using provided shell scripts to set up AWS resources (S3, Lambda, Cognito, CloudFront). Access the web interface via the CloudFront distribution URL to record meetings or conversations. Use the file manager to review, play, and organize recorded sessions. Future phases will enable semantic search and live transcription on this stored data.

At a glance

CapabilitiesAudio RecordingFile ManagementUser AuthenticationServerless BackendMobile Responsive UI

Componentsapi/audio.jsapi/s3.jsapi/session-helpers.jsweb/index.htmlweb/audio.html.templateserverless.yml

TechHTML/CSS/JSNode.jsAWS LambdaAmazon S3Amazon CognitoAPI GatewayCloudFront

Depends onAWS AccountNode.js 18+AWS CLI

Integrates withAWS CognitoAWS S3AWS Lambda

PatternsPre-signed URLsChunked UploadServerless APIUser Isolation

Reuse tagsaudio-recordingserverlessawstranscription-readymeeting-intelligence