openai_transcribe · davidbmar.com

What it is

This project provides two distinct Node.js server implementations for transcribing audio using OpenAI's Whisper model. It acts as a backend proxy, accepting audio data via HTTP POST requests and forwarding it to the OpenAI API. The 'turn_based' version handles discrete audio file uploads, while the 'streaming_based' version attempts to handle continuous audio streams by buffering chunks before sending them for transcription.

Features

Proxies audio data to OpenAI Whisper API
Supports turn-based file upload transcription
Experimental support for streaming audio buffering
Configurable via OPENAI_API_KEY environment variable
Includes CORS headers for browser compatibility

Quickstart

npm install
export OPENAI_API_KEY=your_api_key
node turn_based_transcribe/server.js

Architecture

flowchart TD
    Client[Web Client] -->|POST /audio| Server[Node.js HTTP Server]
    Server -->|Check Env| Env[OPENAI_API_KEY]
    Server -->|POST FormData| OpenAI[OpenAI Whisper API]
    OpenAI -->|JSON Transcript| Server
    Server -->|JSON Response| Client

How it's built

The application is built using native Node.js modules (http, fs, path) and minimal dependencies like axios and form-data for the turn-based variant. It relies on environment variables for configuration, specifically requiring an OPENAI_API_KEY. The servers implement basic CORS headers to allow cross-origin requests from web clients.

How it runs

sequenceDiagram
    participant C as Client
    participant S as Node.js Server
    participant O as OpenAI API
    C->>S: POST /audio (Audio Data)
    S->>S: Validate OPENAI_API_KEY
    S->>S: Construct FormData
    S->>O: POST /v1/audio/transcriptions
    O-->>S: JSON { text: "..." }
    S-->>C: 200 OK (Transcription Text)

How to apply & reuse

Use this project as a simple backend service to add speech-to-text capabilities to web applications without exposing your OpenAI API key to the client-side code. It is suitable for prototypes or internal tools where a full-featured media processing pipeline is not required.

At a glance

CapabilitiesAudio TranscriptionHTTP ProxyingAPI Key Management

Componentsturn_based_transcribe/server.jsstreaming_based_transcribe/server.js

TechNode.jsJavaScriptHTTPOpenAI API

Depends onaxiosform-data

Integrates withOpenAI Whisper

PatternsProxy PatternServer-Side API Key Storage

Reuse tagsspeech-to-textopenai-wrappernodejs-serveraudio-processing

Repo hygiene

✓ all on main — nothing unmerged.