Browser-based voice memo recorder with chunked S3 streaming and local faster-whisper transcription.
https://github.com/davidbmar/recording_app · private · shipped
A privacy-focused voice recording system that captures audio in the browser, streams it in chunks to private AWS S3 storage, and uses a local worker (running on your Mac) to transcribe the audio using faster-whisper. It includes a web interface for live status tracking and supports advanced features like intent recognition ('hey riff') and transcript hydration.
cp env.sample .env export TOKEN_SECRET="$(openssl rand -hex 32)" ./scripts/deploy.sh cd /path/to/recording_app python3 -m venv .wenv .wenv/bin/pip install boto3 .wenv/bin/python -m worker.worker --bucket <bucket> --region us-east-2 --stub
flowchart TD
User[User Browser] -->|MediaStream API| Client[Web App Client]
Client -->|Chunked Audio POST| Lambda[AWS Lambda Web App]
Lambda -->|Auth Check| Auth[src/auth.py]
Lambda -->|PutObject| S3[(AWS S3 Bucket)]
S3 -->|Poll/ListObjects| Worker[Local Transcription Worker]
Worker -->|Download Audio| S3
Worker -->|Transcribe| Whisper[faster-whisper]
Worker -->|Upload Transcript| S3
Worker -->|Hydrate| LLM[DashScope/Qwen API]
Lambda -->|Get Status| S3
Client -->|Poll Status| Lambda
The system consists of a Python-based AWS Lambda web application (served via Function URL) for handling uploads and auth, and a separate Python worker process that polls S3 for new recordings. The worker uses `faster-whisper` for transcription and `boto3` for S3 interaction. Frontend logic handles media streaming and intent resolution via vanilla JavaScript.
sequenceDiagram
participant U as User
participant B as Browser
participant L as Lambda API
participant S as S3 Bucket
participant W as Local Worker
participant M as Whisper Model
U->>B: Start Recording
B->>B: Capture MediaStream
loop Every Chunk
B->>L: POST audio chunk
L->>L: Verify HMAC Token
L->>S: Upload Chunk
end
B->>L: Finalize Recording
W->>S: List New Objects
S-->>W: Return Recording Key
W->>S: Download Full Audio
W->>M: Transcribe Audio
M-->>W: Return Text
W->>S: Upload Transcript.txt
W->>S: Upload Hydrated.json
U->>B: Refresh Page
B->>L: Get Recording Status
L->>S: Check Metadata
S-->>L: Return Status
L-->>B: JSON Status
B-->>U: Display Transcript
Deploy the web backend to AWS using the provided shell scripts and IAM policies. Run the transcription worker locally on a machine with GPU/CPU capacity, configured with least-privilege AWS credentials. Use the web app to record memos, which are automatically transcribed by the local worker.