youtube_commercial_detector

A utility that estimates YouTube ad frequency by downloading videos and analyzing their text transcriptions for commercial keywords.

https://github.com/davidbmar/youtube_commercial_detector  ·  public  ·  shipped

What it is

A Python-based script that automates the process of downloading YouTube video content and its associated subtitles or auto-generated captions. It then parses the resulting text files to identify patterns, keywords, or phrases typically associated with commercials, providing a count of likely ad occurrences within the video duration.

Features

Quickstart

git clone https://github.com/davidbmar/youtube_commercial_detector.git
cd youtube_commercial_detector
pip install -r requirements.txt
python main.py <youtube_url>

Architecture

flowchart TD
    A[User Input URL] --> B[Downloader Module]
    B --> C{Caption Available?}
    C -->|Yes| D[Extract Text]
    C -->|No| E[Auto-Generate/Transcribe]
    D --> F[Text Parser]
    E --> F
    F --> G[Keyword Matcher]
    G --> H[Ad Counter]
    H --> I[Report Output]

How it's built

The project relies on external command-line tools for media acquisition (likely youtube-dl or yt-dlp) and standard Python string processing libraries for text analysis. It operates as a linear pipeline: download media/captions -> extract text -> scan for ad indicators -> report count.

How it runs

sequenceDiagram
    participant User
    participant MainScript
    participant Downloader
    participant Parser
    participant Output
    
    User->>MainScript: Provide YouTube URL
    MainScript->>Downloader: Download video & captions
    Downloader-->>MainScript: Return text file path
    MainScript->>Parser: Load and read text
    Parser->>Parser: Scan for ad keywords
    Parser-->>MainScript: Return match count
    MainScript->>Output: Print ad frequency

How to apply & reuse

Use this tool for media analysis, academic research into advertising density on specific channels, or personal curiosity about ad load in long-form content. It is not a real-time blocker but a post-hoc analyzer.

At a glance

CapabilitiesYouTube content retrievalSubtitle extractionText pattern matchingAd frequency estimation
ComponentsDownload handlerText extractorKeyword scannerResult formatter
TechPythonyoutube-dl/yt-dlpRegular ExpressionsFile I/O
Depends onPython 3.xyoutube-dl or yt-dlprequests
Integrates withYouTube Data API (indirectly via downloader)Local file system
PatternsPipeline processingKeyword filteringScript automation
Reuse tagsmedia-analysisyoutube-toolstext-miningad-detectionpython-script

Repo hygiene

✓ all on main — nothing unmerged.