A utility that estimates YouTube ad frequency by downloading videos and analyzing their text transcriptions for commercial keywords.
https://github.com/davidbmar/youtube_commercial_detector · public · shipped
A Python-based script that automates the process of downloading YouTube video content and its associated subtitles or auto-generated captions. It then parses the resulting text files to identify patterns, keywords, or phrases typically associated with commercials, providing a count of likely ad occurrences within the video duration.
git clone https://github.com/davidbmar/youtube_commercial_detector.git cd youtube_commercial_detector pip install -r requirements.txt python main.py <youtube_url>
flowchart TD
A[User Input URL] --> B[Downloader Module]
B --> C{Caption Available?}
C -->|Yes| D[Extract Text]
C -->|No| E[Auto-Generate/Transcribe]
D --> F[Text Parser]
E --> F
F --> G[Keyword Matcher]
G --> H[Ad Counter]
H --> I[Report Output]
The project relies on external command-line tools for media acquisition (likely youtube-dl or yt-dlp) and standard Python string processing libraries for text analysis. It operates as a linear pipeline: download media/captions -> extract text -> scan for ad indicators -> report count.
sequenceDiagram
participant User
participant MainScript
participant Downloader
participant Parser
participant Output
User->>MainScript: Provide YouTube URL
MainScript->>Downloader: Download video & captions
Downloader-->>MainScript: Return text file path
MainScript->>Parser: Load and read text
Parser->>Parser: Scan for ad keywords
Parser-->>MainScript: Return match count
MainScript->>Output: Print ad frequency
Use this tool for media analysis, academic research into advertising density on specific channels, or personal curiosity about ad load in long-form content. It is not a real-time blocker but a post-hoc analyzer.
✓ all on main — nothing unmerged.