GitHub Portfolio Search

Semantic search engine for personal GitHub repositories using embeddings and TF-IDF.

https://github.com/davidbmar/github-portfolio-search  ·  public  ·  shipped

What it is

A tool that indexes GitHub repositories (public and private) into a local vector store, enabling semantic search, capability clustering, and pattern discovery across a developer's portfolio. It provides a CLI, a FastAPI backend, an MCP server for AI agents, and a static web UI.

Features

Quickstart

git clone https://github.com/davidbmar/github-portfolio-search.git
cd github-portfolio-search
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
# Edit .env to add GITHUB_TOKEN
ghps index <your-github-username>
ghps export
cd web && python3 -m http.server 8000

Architecture

flowchart TD
    GH[GitHub API] -->|Repos/READMEs| IDX[Indexing Pipeline]
    IDX -->|Embeddings| DB[(SQLite-vec)]
    IDX -->|JSON Export| WEB[Static Web UI]
    DB -->|Query| API[FastAPI Server]
    DB -->|Query| CLI[ghps CLI]
    DB -->|Query| MCP[MCP Server]
    WEB -->|Search Requests| API
    WEB -->|OAuth| LAMBDA[AWS Lambda]
    LAMBDA -->|Logs| S3[S3 Logs]

How it's built

Python-based indexing pipeline using sentence-transformers for embeddings and SQLite-vec for vector storage. The web UI is a static SPA served from S3/CloudFront, consuming JSON exports generated by the indexer. Authentication is handled via Google OAuth Lambda functions, and AI agent integration is provided via an MCP server.

How it runs

sequenceDiagram
    participant User
    participant CLI as ghps CLI
    participant Indexer
    participant GH as GitHub API
    participant Store as SQLite-vec
    
    User->>CLI: ghps index username
    CLI->>Indexer: Start indexing
    Indexer->>GH: Fetch repos & READMEs
    GH-->>Indexer: Repo metadata & content
    Indexer->>Indexer: Generate embeddings
    Indexer->>Store: Insert vectors & metadata
    Store-->>Indexer: Confirm storage
    Indexer-->>CLI: Index complete
    
    User->>CLI: ghps search "query"
    CLI->>Store: Semantic search
    Store-->>CLI: Ranked results
    CLI-->>User: Display results

How to apply & reuse

Use to build a searchable knowledge base of your own code, find reusable patterns across projects, or expose your portfolio to AI assistants via MCP. Deploy the static site to showcase work or run locally for private code exploration.

At a glance

CapabilitiesSemantic Code SearchRepository ClusteringAI Agent Integration (MCP)Static Site GenerationAccess Control Management
Componentssrc/ghps/indexer.pysrc/ghps/search.pysrc/ghps/cli.pysrc/ghps/mcp_server.pysrc/ghps/api.pyweb/index.htmllambda/github_oauth/handler.py
TechPythonSQLite-vecsentence-transformersFastAPIClickAWS LambdaS3/CloudFront
Depends onGitHub Personal Access TokenPython 3.9+AWS CLI (for deployment)
Integrates withClaude CodeGoogle OAuthTelegram (notifications)GitHub Actions
PatternsVector SearchStatic Site GenerationServerless AuthenticationCLI ToolingModel Context Protocol (MCP)
Reuse tagsportfoliosearch-engineknowledge-basedeveloper-toolsai-assistant

Repo hygiene

✓ all on main — nothing unmerged.