Semantic search engine for personal GitHub repositories using embeddings and TF-IDF.
https://github.com/davidbmar/github-portfolio-search · public · shipped
A tool that indexes GitHub repositories (public and private) into a local vector store, enabling semantic search, capability clustering, and pattern discovery across a developer's portfolio. It provides a CLI, a FastAPI backend, an MCP server for AI agents, and a static web UI.
git clone https://github.com/davidbmar/github-portfolio-search.git cd github-portfolio-search python3 -m venv .venv source .venv/bin/activate pip install -e ".[dev]" cp .env.example .env # Edit .env to add GITHUB_TOKEN ghps index <your-github-username> ghps export cd web && python3 -m http.server 8000
flowchart TD
GH[GitHub API] -->|Repos/READMEs| IDX[Indexing Pipeline]
IDX -->|Embeddings| DB[(SQLite-vec)]
IDX -->|JSON Export| WEB[Static Web UI]
DB -->|Query| API[FastAPI Server]
DB -->|Query| CLI[ghps CLI]
DB -->|Query| MCP[MCP Server]
WEB -->|Search Requests| API
WEB -->|OAuth| LAMBDA[AWS Lambda]
LAMBDA -->|Logs| S3[S3 Logs]
Python-based indexing pipeline using sentence-transformers for embeddings and SQLite-vec for vector storage. The web UI is a static SPA served from S3/CloudFront, consuming JSON exports generated by the indexer. Authentication is handled via Google OAuth Lambda functions, and AI agent integration is provided via an MCP server.
sequenceDiagram
participant User
participant CLI as ghps CLI
participant Indexer
participant GH as GitHub API
participant Store as SQLite-vec
User->>CLI: ghps index username
CLI->>Indexer: Start indexing
Indexer->>GH: Fetch repos & READMEs
GH-->>Indexer: Repo metadata & content
Indexer->>Indexer: Generate embeddings
Indexer->>Store: Insert vectors & metadata
Store-->>Indexer: Confirm storage
Indexer-->>CLI: Index complete
User->>CLI: ghps search "query"
CLI->>Store: Semantic search
Store-->>CLI: Ranked results
CLI-->>User: Display results
Use to build a searchable knowledge base of your own code, find reusable patterns across projects, or expose your portfolio to AI assistants via MCP. Deploy the static site to showcase work or run locally for private code exploration.
✓ all on main — nothing unmerged.