Video Picture Annotator Review

What it is

A Python-based automation and review tool that processes video recordings and images. It uses scene detection to break down videos, applies AI-driven annotations (via Claude) or manual specs to highlight UI elements, and generates HTML/PDF reports. It also provides a web interface for capturing screenshots and performing pixel-level image comparisons with SSIM scoring.

Features

Automated video analysis with scene detection and keyframe extraction
AI-powered UI annotation using Claude to identify and label elements
Visual diffing of images with SSIM scores and overlay views
Web UI for live screenshot capture and interactive report viewing
CLI pipeline for end-to-end review: extract, annotate, and report generation
Support for both spec-driven and fully autonomous annotation modes

Quickstart

pip install -e ".[dev]"
video-annotator --help
video-annotator serve

Architecture

flowchart TD
    User[User/Developer] --> CLI[CLI Interface]
    User --> WebUI[Web UI :7070]
    CLI --> Core[Core Pipeline]
    WebUI --> API[JSON API Server]
    Core --> VideoAgent[Video Agent]
    Core --> Annotator[Annotator Engine]
    Core --> Reporter[Report Renderer]
    API --> ScreenshotSvc[Screenshot Service]
    API --> DiffSvc[Diff Service]
    VideoAgent --> OpenCV[OpenCV/SceneDetect]
    Annotator --> Claude[Anthropic Claude API]
    Annotator --> Pillow[Pillow Imaging]
    Reporter --> HTML[HTML/PDF Output]

How it's built

Built in Python using Click for CLI structure, Rich for terminal UIs, and OpenCV/SceneDetect for video processing. It integrates the Anthropic API for intelligent frame annotation and Pillow for image manipulation. The web component is a lightweight server exposing JSON APIs for screenshot capture and image diffing, while the core logic is modularized into agents, annotators, and report renderers.

How it runs

sequenceDiagram
    participant U as User
    participant C as CLI/Web
    participant VA as VideoAgent
    participant A as Annotator
    participant LLM as Claude API
    participant R as ReportRenderer
    U->>C: video-annotator review rec.mp4
    C->>VA: Extract scenes & frames
    VA->>VA: Detect scenes (OpenCV)
    VA-->>C: List of frames/timestamps
    C->>A: Annotate frames with spec
    A->>LLM: Send frame + context
    LLM-->>A: Return callouts/labels
    A->>A: Draw highlights (Pillow)
    A-->>C: Annotated frames
    C->>R: Generate report
    R-->>U: report.html

How to apply & reuse

Use it to automate QA reviews of UI changes by comparing before/after screenshots, generate annotated walkthroughs of user sessions from screen recordings, or integrate into CI pipelines to visually verify application states using natural language queries.

At a glance

CapabilitiesVideo scene detectionAI-based UI annotationImage diffing (SSIM)Screenshot captureHTML/PDF report generationWeb server hosting

ComponentsVideoAgentAnnotatorReportRendererWebServerCLI Entry Point

TechPythonClickRichOpenCVSceneDetectPillowAnthropic APIPlaywright

Depends onffmpeganthropicopencv-pythonscenedetectclickrichPillowweasyprint

Integrates withClaude AIWeb BrowsersCI/CD Pipelines

PatternsPipeline ProcessingAgent-Based AutomationREST-like APISpec-Driven Execution

Reuse tagsvideo-processingui-testingai-annotationvisual-regressionreporting-tool