Video Picture Annotator Review

A Playwright-style tool for analyzing screen recordings, annotating UI elements with AI, and generating visual diff reports.

https://github.com/davidbmar/tool-software-video-picture-annotator-review  ·  private  ·  shipped

What it is

A Python-based automation and review tool that processes video recordings and images. It uses scene detection to break down videos, applies AI-driven annotations (via Claude) or manual specs to highlight UI elements, and generates HTML/PDF reports. It also provides a web interface for capturing screenshots and performing pixel-level image comparisons with SSIM scoring.

Features

Quickstart

pip install -e ".[dev]"
video-annotator --help
video-annotator serve

Architecture

flowchart TD
    User[User/Developer] --> CLI[CLI Interface]
    User --> WebUI[Web UI :7070]
    CLI --> Core[Core Pipeline]
    WebUI --> API[JSON API Server]
    Core --> VideoAgent[Video Agent]
    Core --> Annotator[Annotator Engine]
    Core --> Reporter[Report Renderer]
    API --> ScreenshotSvc[Screenshot Service]
    API --> DiffSvc[Diff Service]
    VideoAgent --> OpenCV[OpenCV/SceneDetect]
    Annotator --> Claude[Anthropic Claude API]
    Annotator --> Pillow[Pillow Imaging]
    Reporter --> HTML[HTML/PDF Output]

How it's built

Built in Python using Click for CLI structure, Rich for terminal UIs, and OpenCV/SceneDetect for video processing. It integrates the Anthropic API for intelligent frame annotation and Pillow for image manipulation. The web component is a lightweight server exposing JSON APIs for screenshot capture and image diffing, while the core logic is modularized into agents, annotators, and report renderers.

How it runs

sequenceDiagram
    participant U as User
    participant C as CLI/Web
    participant VA as VideoAgent
    participant A as Annotator
    participant LLM as Claude API
    participant R as ReportRenderer
    U->>C: video-annotator review rec.mp4
    C->>VA: Extract scenes & frames
    VA->>VA: Detect scenes (OpenCV)
    VA-->>C: List of frames/timestamps
    C->>A: Annotate frames with spec
    A->>LLM: Send frame + context
    LLM-->>A: Return callouts/labels
    A->>A: Draw highlights (Pillow)
    A-->>C: Annotated frames
    C->>R: Generate report
    R-->>U: report.html

How to apply & reuse

Use it to automate QA reviews of UI changes by comparing before/after screenshots, generate annotated walkthroughs of user sessions from screen recordings, or integrate into CI pipelines to visually verify application states using natural language queries.

At a glance

CapabilitiesVideo scene detectionAI-based UI annotationImage diffing (SSIM)Screenshot captureHTML/PDF report generationWeb server hosting
ComponentsVideoAgentAnnotatorReportRendererWebServerCLI Entry Point
TechPythonClickRichOpenCVSceneDetectPillowAnthropic APIPlaywright
Depends onffmpeganthropicopencv-pythonscenedetectclickrichPillowweasyprint
Integrates withClaude AIWeb BrowsersCI/CD Pipelines
PatternsPipeline ProcessingAgent-Based AutomationREST-like APISpec-Driven Execution
Reuse tagsvideo-processingui-testingai-annotationvisual-regressionreporting-tool

Repo hygiene

✓ all on main — nothing unmerged.