A Python-based Retrieval-Augmented Generation (RAG) system for document Q&A using FastAPI, Streamlit, ChromaDB, and OpenAI.
https://github.com/davidbmar/rag-document-chat · public · shipped
This project implements a complete RAG pipeline that ingests PDF documents, processes them using hierarchical semantic grouping and compression, stores embeddings in ChromaDB, and provides a chat interface via Streamlit backed by a FastAPI server. It supports optional S3 storage for documents and uses OpenAI for both embedding generation and LLM responses.
pip install -r requirements.txt python -m nltk.downloader punkt_tab punkt stopwords uvicorn app:app --reload streamlit run streamlit_app.py
flowchart TD
User[User] -->|Uploads PDF| Streamlit[Streamlit Frontend]
Streamlit -->|POST /upload| FastAPI[FastAPI Backend]
FastAPI -->|Extract Text| PyPDF2[PyPDF2]
PyPDF2 -->|Raw Text| HierarchicalProc[HierarchicalProcessor]
HierarchicalProc -->|Logical Groups & Summaries| Embedder[OpenAI Embeddings]
Embedder -->|Vectors| ChromaDB[(ChromaDB)]
User -->|Asks Question| Streamlit
Streamlit -->|POST /query| FastAPI
FastAPI -->|Search| ChromaDB
ChromaDB -->|Context Chunks| FastAPI
FastAPI -->|Prompt + Context| OpenAI[OpenAI LLM]
OpenAI -->|Answer| FastAPI
FastAPI -->|Response| Streamlit
The backend is built with FastAPI to handle document uploads and query processing. It uses PyPDF2 for text extraction, NLTK for tokenization, and a custom HierarchicalProcessor to group sentences into logical units and generate compressed summaries (10:1 ratio). Embeddings are stored in ChromaDB (local or HTTP mode). The frontend is a Streamlit application that interacts with the FastAPI backend. AWS Boto3 is used for optional S3 integration.
sequenceDiagram
participant U as User
participant S as Streamlit UI
participant F as FastAPI Server
participant P as HierarchicalProcessor
participant C as ChromaDB
participant O as OpenAI API
Note over U, O: Document Ingestion Phase
U->>S: Upload PDF File
S->>F: POST /upload file
F->>F: Parse PDF with PyPDF2
F->>P: Process Text Hierarchically
P->>P: Group Sentences & Generate Summaries
P->>O: Generate Embeddings for Chunks
O-->>P: Return Vectors
P->>C: Store Chunks with Metadata
C-->>F: Confirm Storage
F-->>S: Upload Success
S-->>U: Display Success Message
Note over U, O: Query Phase
U->>S: Ask Question
S->>F: POST /query {question}
F->>O: Generate Query Embedding
O-->>F: Return Query Vector
F->>C: Similarity Search
C-->>F: Return Relevant Chunks
F->>O: Send Prompt with Context
O-->>F: Return Generated Answer
F-->>S: Return Answer
S-->>U: Display Answer
Use this template to build internal knowledge base chatbots, legal document reviewers, or academic paper assistants. It is suitable for developers needing a structured RAG implementation with advanced chunking strategies beyond simple character splitting.