recording_app

Browser-based voice memo recorder with chunked S3 streaming and local faster-whisper transcription.

https://github.com/davidbmar/recording_app  ·  private  ·  shipped

What it is

A privacy-focused voice recording system that captures audio in the browser, streams it in chunks to private AWS S3 storage, and uses a local worker (running on your Mac) to transcribe the audio using faster-whisper. It includes a web interface for live status tracking and supports advanced features like intent recognition ('hey riff') and transcript hydration.

Features

Quickstart

cp env.sample .env
export TOKEN_SECRET="$(openssl rand -hex 32)"
./scripts/deploy.sh
cd /path/to/recording_app
python3 -m venv .wenv
.wenv/bin/pip install boto3
.wenv/bin/python -m worker.worker --bucket <bucket> --region us-east-2 --stub

Architecture

flowchart TD
    User[User Browser] -->|MediaStream API| Client[Web App Client]
    Client -->|Chunked Audio POST| Lambda[AWS Lambda Web App]
    Lambda -->|Auth Check| Auth[src/auth.py]
    Lambda -->|PutObject| S3[(AWS S3 Bucket)]
    S3 -->|Poll/ListObjects| Worker[Local Transcription Worker]
    Worker -->|Download Audio| S3
    Worker -->|Transcribe| Whisper[faster-whisper]
    Worker -->|Upload Transcript| S3
    Worker -->|Hydrate| LLM[DashScope/Qwen API]
    Lambda -->|Get Status| S3
    Client -->|Poll Status| Lambda

How it's built

The system consists of a Python-based AWS Lambda web application (served via Function URL) for handling uploads and auth, and a separate Python worker process that polls S3 for new recordings. The worker uses `faster-whisper` for transcription and `boto3` for S3 interaction. Frontend logic handles media streaming and intent resolution via vanilla JavaScript.

How it runs

sequenceDiagram
    participant U as User
    participant B as Browser
    participant L as Lambda API
    participant S as S3 Bucket
    participant W as Local Worker
    participant M as Whisper Model

    U->>B: Start Recording
    B->>B: Capture MediaStream
    loop Every Chunk
        B->>L: POST audio chunk
        L->>L: Verify HMAC Token
        L->>S: Upload Chunk
    end
    B->>L: Finalize Recording
    W->>S: List New Objects
    S-->>W: Return Recording Key
    W->>S: Download Full Audio
    W->>M: Transcribe Audio
    M-->>W: Return Text
    W->>S: Upload Transcript.txt
    W->>S: Upload Hydrated.json
    U->>B: Refresh Page
    B->>L: Get Recording Status
    L->>S: Check Metadata
    S-->>L: Return Status
    L-->>B: JSON Status
    B-->>U: Display Transcript

How to apply & reuse

Deploy the web backend to AWS using the provided shell scripts and IAM policies. Run the transcription worker locally on a machine with GPU/CPU capacity, configured with least-privilege AWS credentials. Use the web app to record memos, which are automatically transcribed by the local worker.

At a glance

CapabilitiesVoice RecordingAudio StreamingSpeech-to-TextCloud Storage IntegrationLocal ProcessingIntent Recognition
ComponentsWeb Client (JS/HTML)AWS Lambda BackendS3 StorageLocal Python WorkerAuth ModuleIntent Resolver
TechPythonJavaScriptAWS LambdaAWS S3faster-whisperboto3PygmentsMermaid
Depends onboto3faster-whisperopenaimarkdownpytest
Integrates withAWS S3AWS LambdaDashScope (Qwen)Alibaba Cloud Model Studio
PatternsChunked UploadWorker PoolStateless AuthPollingLeast Privilege Access
Reuse tagsvoice-memostranscriptionaws-serverlesslocal-aiprivacy-first

⚠ Needs attention