Huberman Lab Podcast Transcripts

What it is

This repository provides text-based transcripts for the first 30 episodes of the Huberman Lab Podcast. The content is derived from YouTube auto-generated captions, cleaned and formatted into readable MS Word (.docx) and Markdown (.md) files. It serves as a static reference archive for listeners who prefer reading or searching specific topics discussed by Dr. Andrew Huberman.

Features

Transcripts for Episodes 1–30 in Markdown and MS Word formats
Cleaned and formatted text derived from YouTube captions
Excludes non-expository episodes (26, 29) and auto-only episode (23)
Static archive suitable for offline reading and text analysis
No software dependencies required to access content

Quickstart

git clone https://github.com/davidbmar/Huberman-Lab-Podcast-Transcripts.git
cd Huberman-Lab-Podcast-Transcripts
ls *.md

Architecture

flowchart TD
    A[YouTube Video] -->|Auto-Captions| B(Raw Transcript Data)
    B -->|Manual Cleaning & Formatting| C{Formatted Output}
    C -->|Export| D[Markdown Files .md]
    C -->|Export| E[MS Word Files .docx]
    D --> F[GitHub Repository]
    E --> F

How it's built

The project consists of static data files. Transcripts were generated by extracting auto-generated captions from YouTube, then manually processed and formatted into structured documents. No executable code, build scripts, or dynamic generation tools are included in the repository.

How it runs

sequenceDiagram
    participant Y as YouTube
    participant C as Creator
    participant R as Repository
    participant U as User
    Y->>C: Provide Auto-Generated Captions
    C->>C: Clean and Format Text
    C->>R: Upload .md and .docx files
    U->>R: Clone or Download Repository
    R->>U: Return Transcript Files

How to apply & reuse

Users can download individual transcript files for offline reading, use them for personal study notes, or import the text into note-taking applications. Researchers or developers can use these texts as a dataset for natural language processing tasks, such as topic modeling or sentiment analysis, related to health and neuroscience content.

At a glance

Capabilities—

ComponentsMarkdown TranscriptsMS Word TranscriptsREADME Documentation

TechMarkdownMicrosoft Word

Depends on—

Integrates with—

PatternsStatic Data ArchiveContent Curation

Reuse tagsdatasettranscriptspodcasthealthneurosciencestatic-content

Repo hygiene

✓ all on main — nothing unmerged.