Self-contained Dockerized RAG service for semantic document search using LanceDB and Nomic embeddings.
https://github.com/davidbmar/voice-optimal-RAG · public · shipped
A lightweight, single-container Retrieval-Augmented Generation (RAG) backend designed to feed context into voice assistants. It ingests PDF, Markdown, TXT, DOCX, and HTML files, splits them into token-aware chunks, generates 768-dimensional vectors using Nomic Embed Text v1.5, and stores them in an embedded LanceDB instance. It exposes a REST API and a simple web UI for uploading documents and performing semantic similarity searches.
docker build -t rag-service . docker run -d --name rag-service -p 8100:8100 -v rag-data:/data rag-service open http://localhost:8100
flowchart TD
Client[Client / Voice Assistant] -->|HTTP POST /upload| API[FastAPI App]
Client -->|HTTP POST /query| API
API -->|Orchestrate| Pipeline[Document Pipeline]
Pipeline -->|Parse| Parsers[Parsers: PyMuPDF/Text]
Parsers -->|Raw Text| Chunker[Chunker: tiktoken]
Chunker -->|Text Chunks| Embedder[Embedder: SentenceTransformers]
Embedder -->|Vectors| Store[LanceDB Vector Store]
Store -->|Similarity Results| API
API -->|JSON Response| Client
subgraph Data Persistence
Store -->|/data/lancedb| Volume[Docker Volume]
end
Built with Python FastAPI for the web server, sentence-transformers for embedding generation, and LanceDB for vector storage. Text processing uses PyMuPDF for PDFs and tiktoken for recursive character splitting. The entire stack is containerized in a single Docker image with the embedding model pre-downloaded to ensure fast startup and zero external dependencies.
sequenceDiagram
participant C as Client
participant A as FastAPI App
participant P as Document Pipeline
participant E as Embedder
participant V as LanceDB
Note over C, V: Ingestion Flow
C->>A: POST /upload (files)
A->>P: ingest_file(filepath)
P->>P: Parse file (parsers.py)
P->>P: Chunk text (chunker.py)
P->>E: embed_batch(chunks)
E->>E: Model.encode()
E-->>P: Vectors
P->>V: insert_chunks(vectors)
V-->>P: Confirm Storage
P-->>A: Document Info
A-->>C: JSON Response
Note over C, V: Query Flow
C->>A: POST /query {query, top_k}
A->>E: embed_text(query)
E->>E: Model.encode()
E-->>A: Query Vector
A->>V: search(vector, top_k)
V-->>A: Ranked Results
A-->>C: JSON Results with Scores
Deploy as a sidecar or standalone service for applications requiring local, private semantic search over documentation. Ideal for voice assistant backends where low-latency context retrieval is needed without relying on external cloud vector databases. Use the provided GitHub indexing scripts to automatically keep technical documentation up-to-date.