Open Source Sound Generator LLMs

A local browser-based benchmarking suite for comparing nine open-source audio generation models with real-time progress streaming and Apple Silicon optimization.

https://github.com/davidbmar/opensource_sound_generator_llms  ·  public  ·  shipped

Open Source Sound Generator LLMs screenshot

What it is

A full-stack application consisting of a React frontend and FastAPI backend that allows users to generate, compare, and benchmark sound effects, music, and speech models. It features real-time Server-Sent Events (SSE) for monitoring generation steps, an LRU cache for instant repeat results, and automated hardware detection to optimize for MPS (Apple Silicon), CUDA, or CPU.

Features

Quickstart

git clone https://github.com/davidbmar/opensource_sound_generator_llms.git
cd opensource_sound_generator_llms
make install
make start

Architecture

flowchart TD
    subgraph Client["Browser (React/Vite :5173)"]
        UI["UI Components\n(Model Selector, Prompt Input)"]
        Player["Audio Player & Waveform"]
        Table["Compare Table"]
    end

    subgraph Server["Python Backend (FastAPI :8000)"]
        API["API Routes\n(/api/generate-stream, /api/models)"]
        MM["ModelManager\n(Asyncio Lock, One-at-a-time)"]
        Cache["GenerationCache\n(LRU Store)"]
        Gen["Generators\n(AudioLDM, MusicGen, Bark, etc.)"]
        Utils["Audio Utils\n(Numpy to WAV)"]
    end

    UI -->|HTTP/SSE| API
    API -->|Manage Lifecycle| MM
    API -->|Check/Store| Cache
    MM -->|Load/Unload| Gen
    Gen -->|Raw Audio| Utils
    Utils -->|WAV Bytes| Cache
    Cache -->|WAV Bytes| API
    API -->|SSE Progress| UI
    API -->|WAV Binary| Player
    Player -->|Display Data| Table

How it's built

The backend uses FastAPI with PyTorch, Diffusers, and Transformers libraries to manage model loading and inference. It employs an asyncio lock to ensure only one model resides in memory at a time, using gc.collect() for cleanup. The frontend is built with React and Vite, consuming SSE streams for progress updates and fetching binary WAV data separately to avoid proxy limitations. Audio processing relies on NumPy and SoundFile for format conversion.

How it runs

sequenceDiagram
    participant User as Browser UI
    participant API as FastAPI Endpoint
    participant Manager as ModelManager
    participant Model as Loaded Model
    participant Cache as GenerationCache

    User->>API: POST /api/generate-stream
    API->>Cache: Check cache key
    alt Cache Hit
        Cache-->>API: Return cached WAV bytes
        API-->>User: SSE: complete (instant)
    else Cache Miss
        API->>Manager: Acquire lock & get model
        Manager->>Model: Ensure loaded (warm-up if needed)
        loop Inference Steps
            Model-->>API: Yield progress (step N/total)
            API-->>User: SSE: progress event
        end
        Model-->>API: Return raw audio array
        API->>Cache: Store WAV bytes
        API-->>User: SSE: complete (audio_id)
        User->>API: GET /api/audio/{id}
        API-->>User: Binary WAV data
    end

How to apply & reuse

Use this project to evaluate the quality, speed, and resource consumption of different open-source audio models before integrating them into production pipelines. It serves as a reference implementation for handling large model swaps, streaming inference progress, and managing GPU/MPS memory constraints in local AI applications.

At a glance

CapabilitiesText-to-Audio GenerationText-to-Music GenerationText-to-Speech SynthesisReal-time Progress StreamingAutomated Performance BenchmarkingLocal Model Management
ComponentsFastAPI BackendReact FrontendModelManager ServiceGenerationCache StoreBenchmark ScriptAudio Utilities Module
TechPython 3.13+FastAPIReactVitePyTorchDiffusersTransformersNumPySoundFile
Depends onNode.js 20+PyTorchHugging Face HubFFmpeg (implicit via soundfile)Make
Integrates withHugging Face Model HubApple Metal Performance Shaders (MPS)NVIDIA CUDA
PatternsServer-Sent Events (SSE)LRU CachingSingleton Model LoadingLazy InitializationResource Locking
Reuse tagsaudio-generationllm-benchmarkingfastapi-reactmps-optimizationsse-streaminglocal-ai

⚠ Needs attention