rag-document-chat

A Python-based Retrieval-Augmented Generation (RAG) system for document Q&A using FastAPI, Streamlit, ChromaDB, and OpenAI.

https://github.com/davidbmar/rag-document-chat  ·  public  ·  shipped

What it is

This project implements a complete RAG pipeline that ingests PDF documents, processes them using hierarchical semantic grouping and compression, stores embeddings in ChromaDB, and provides a chat interface via Streamlit backed by a FastAPI server. It supports optional S3 storage for documents and uses OpenAI for both embedding generation and LLM responses.

Features

Quickstart

pip install -r requirements.txt
python -m nltk.downloader punkt_tab punkt stopwords
uvicorn app:app --reload
streamlit run streamlit_app.py

Architecture

flowchart TD
    User[User] -->|Uploads PDF| Streamlit[Streamlit Frontend]
    Streamlit -->|POST /upload| FastAPI[FastAPI Backend]
    FastAPI -->|Extract Text| PyPDF2[PyPDF2]
    PyPDF2 -->|Raw Text| HierarchicalProc[HierarchicalProcessor]
    HierarchicalProc -->|Logical Groups & Summaries| Embedder[OpenAI Embeddings]
    Embedder -->|Vectors| ChromaDB[(ChromaDB)]
    User -->|Asks Question| Streamlit
    Streamlit -->|POST /query| FastAPI
    FastAPI -->|Search| ChromaDB
    ChromaDB -->|Context Chunks| FastAPI
    FastAPI -->|Prompt + Context| OpenAI[OpenAI LLM]
    OpenAI -->|Answer| FastAPI
    FastAPI -->|Response| Streamlit

How it's built

The backend is built with FastAPI to handle document uploads and query processing. It uses PyPDF2 for text extraction, NLTK for tokenization, and a custom HierarchicalProcessor to group sentences into logical units and generate compressed summaries (10:1 ratio). Embeddings are stored in ChromaDB (local or HTTP mode). The frontend is a Streamlit application that interacts with the FastAPI backend. AWS Boto3 is used for optional S3 integration.

How it runs

sequenceDiagram
    participant U as User
    participant S as Streamlit UI
    participant F as FastAPI Server
    participant P as HierarchicalProcessor
    participant C as ChromaDB
    participant O as OpenAI API

    Note over U, O: Document Ingestion Phase
    U->>S: Upload PDF File
    S->>F: POST /upload file
    F->>F: Parse PDF with PyPDF2
    F->>P: Process Text Hierarchically
    P->>P: Group Sentences & Generate Summaries
    P->>O: Generate Embeddings for Chunks
    O-->>P: Return Vectors
    P->>C: Store Chunks with Metadata
    C-->>F: Confirm Storage
    F-->>S: Upload Success
    S-->>U: Display Success Message

    Note over U, O: Query Phase
    U->>S: Ask Question
    S->>F: POST /query {question}
    F->>O: Generate Query Embedding
    O-->>F: Return Query Vector
    F->>C: Similarity Search
    C-->>F: Return Relevant Chunks
    F->>O: Send Prompt with Context
    O-->>F: Return Generated Answer
    F-->>S: Return Answer
    S-->>U: Display Answer

How to apply & reuse

Use this template to build internal knowledge base chatbots, legal document reviewers, or academic paper assistants. It is suitable for developers needing a structured RAG implementation with advanced chunking strategies beyond simple character splitting.

At a glance

CapabilitiesDocument IngestionSemantic ChunkingVector SearchLLM GenerationMetadata TrackingCloud Storage Integration
ComponentsFastAPI ApplicationStreamlit InterfaceHierarchical ProcessorEnhanced Document ProcessorChromaDB ClientS3 Uploader
TechPythonFastAPIStreamlitChromaDBOpenAI APIPyPDF2NLTKBoto3LangChain
Depends onopenaichromadbfastapistreamlitpypdf2boto3nltklangchainpydanticuvicorn
Integrates withAWS S3OpenAI GPT ModelsLocal File System
PatternsRetrieval-Augmented Generation (RAG)Microservices (API + UI)Hierarchical Data ProcessingVector Database Indexing
Reuse tagsragllmdocument-chatfastapistreamlitchromadbpdf-processing

⚠ Needs attention