A Playwright-style tool for analyzing screen recordings, annotating UI elements with AI, and generating visual diff reports.
https://github.com/davidbmar/tool-software-video-picture-annotator-review · private · shipped
A Python-based automation and review tool that processes video recordings and images. It uses scene detection to break down videos, applies AI-driven annotations (via Claude) or manual specs to highlight UI elements, and generates HTML/PDF reports. It also provides a web interface for capturing screenshots and performing pixel-level image comparisons with SSIM scoring.
pip install -e ".[dev]" video-annotator --help video-annotator serve
flowchart TD
User[User/Developer] --> CLI[CLI Interface]
User --> WebUI[Web UI :7070]
CLI --> Core[Core Pipeline]
WebUI --> API[JSON API Server]
Core --> VideoAgent[Video Agent]
Core --> Annotator[Annotator Engine]
Core --> Reporter[Report Renderer]
API --> ScreenshotSvc[Screenshot Service]
API --> DiffSvc[Diff Service]
VideoAgent --> OpenCV[OpenCV/SceneDetect]
Annotator --> Claude[Anthropic Claude API]
Annotator --> Pillow[Pillow Imaging]
Reporter --> HTML[HTML/PDF Output]
Built in Python using Click for CLI structure, Rich for terminal UIs, and OpenCV/SceneDetect for video processing. It integrates the Anthropic API for intelligent frame annotation and Pillow for image manipulation. The web component is a lightweight server exposing JSON APIs for screenshot capture and image diffing, while the core logic is modularized into agents, annotators, and report renderers.
sequenceDiagram
participant U as User
participant C as CLI/Web
participant VA as VideoAgent
participant A as Annotator
participant LLM as Claude API
participant R as ReportRenderer
U->>C: video-annotator review rec.mp4
C->>VA: Extract scenes & frames
VA->>VA: Detect scenes (OpenCV)
VA-->>C: List of frames/timestamps
C->>A: Annotate frames with spec
A->>LLM: Send frame + context
LLM-->>A: Return callouts/labels
A->>A: Draw highlights (Pillow)
A-->>C: Annotated frames
C->>R: Generate report
R-->>U: report.html
Use it to automate QA reviews of UI changes by comparing before/after screenshots, generate annotated walkthroughs of user sessions from screen recordings, or integrate into CI pipelines to visually verify application states using natural language queries.
✓ all on main — nothing unmerged.