Back to archive

Prairie Home Archive — Documentation

How this site was built, what it does, and how to deploy it

1. Project Overview

The Prairie Home Companion Archive is a full-stack web application that catalogs and streams 1,136 episodes of A Prairie Home Companion (1985–2017). Each episode includes metadata (date, venue, host, tags), an MP3 audio stream, and many include timestamped rundowns listing every skit and musical number.

The app reads directly from a 74 MB SQLite database (data/prairiehome.db) using better-sqlite3 in read-only mode. There is no ORM — all queries are plain SQL in src/lib/db.ts.

Key Features

  • Browse 1,136 episodes with year/audio/rundown filters and full-text search
  • Deep search across titles, descriptions, venues, tags, and rundown content
  • Persistent audio player that survives all client-side navigations
  • Clickable rundown timestamps that seek the audio player (2 formats supported)
  • Shareable deep links like /episode/59159#seek-00:08:37
  • Dark/light theme persisted to localStorage
  • Keyboard shortcuts for playback (Space, arrows, M)
  • Volume, mute, and playback speed controls
  • Auto-submitting search with debounce (no Search button)
  • REST API for all data endpoints

2. Tech Stack

LayerTechnologyVersion
FrameworkNext.js (App Router)16.2.7
UI LibraryReact19.2.4
DatabaseSQLite via better-sqlite312.10.0
StylingTailwind CSS (v4) + custom CSS variables4.x
FontsInter, EB Garamond, Public Sans (Google Fonts)
TypeScriptStrict mode5.x
LintESLint (eslint-config-next)9.x
Python ToolsScrapers, audio download, STT, ML (see §11)3.x

3. Architecture & File Structure

prairiehome-next/ ├── data/ │ ├── prairiehome.db # SQLite DB (74 MB, read-only) │ └── segment_analysis.json # Recurring segment data (237 KB) │ ├── scripts/ # Python data pipeline │ ├── db_config.py # Shared DB path helper │ ├── scrape_prairiehome.py # Original web scraper │ ├── patch_prairiehome.py # Rundown/metadata patcher │ ├── download_audio.py # MP3 downloader │ ├── transcribe_episodes.py # Deepgram STT transcription │ ├── detect_audio_events.py # Skit boundary detection │ ├── train_segment_boundaries.py # ML boundary prediction │ ├── app.py # Old Flask app (reference) │ └── requirements.txt # Python deps │ ├── src/ │ ├── app/ # Next.js App Router pages │ │ ├── globals.css # Tailwind + CSS variables │ │ ├── layout.tsx # Root layout (nav + PlayerBar) │ │ ├── page.tsx # Browse page (homepage) │ │ ├── episode/[showId]/ # Episode detail + rundown │ │ ├── search/page.tsx # Deep search page │ │ ├── stats/page.tsx # Statistics dashboard │ │ └── api/ # REST API routes │ │ ├── episodes/route.ts │ │ ├── episodes/[showId]/route.ts │ │ ├── search/route.ts │ │ └── stats/route.ts │ ├── components/ │ │ ├── PlayerBar.tsx # Fixed bottom player │ │ ├── PlayerContext.tsx # Audio state + context provider │ │ ├── EpisodeCard.tsx # Grid card component │ │ ├── RundownContent.tsx # Interactive rundown renderer │ │ ├── AutoSearchForm.tsx # Debounced auto-submit form │ │ └── ThemeToggle.tsx # Dark/light toggle │ └── lib/ │ ├── db.ts # All SQLite queries │ └── utils.ts # Palette, colors, formatting │ ├── next.config.ts # standalone output + allowed origins ├── deploy.sh # Build + bundle deployment script ├── DEPLOY.md # Deployment guide └── package.json

The app is a hybrid: server components handle data fetching (browse, search, episode detail, stats), while client components (PlayerBar, EpisodeCard, RundownContent) handle interactivity. The PlayerProvider wraps the entire app via layout.tsx so audio state persists across navigations.

4. Database Schema

The database at data/prairiehome.db contains four tables.

episodes

The primary table — 1,136 rows, one per episode.

Column | Type | Notes ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ id | INTEGER | Auto-increment PK show_id | TEXT | UNIQUE, numeric ID from prairiehome.org title | TEXT | Episode title page_title | TEXT | h1 from show page (e.g. "Classic Rebroadcast: …") show_date | TEXT | Display date ("November 25, 2017") show_date_parsed | DATE | ISO format (2017-11-25) audio_url | TEXT | MP3 stream URL from play.publicradio.org description | TEXT | Story body text tags | TEXT | Pipe-separated tags venue | TEXT | e.g. "The Fitzgerald Theater" host | TEXT | e.g. "Garrison Keillor" episode_url | TEXT | Full show page URL listing_url | TEXT | Page where show link was found duration | TEXT | Formatted as HH:MM:SS rundown_url | TEXT | URL to the rundown page rundown_content | TEXT | Full HTML of rundown (timestamps + segment names) created_at | TIMESTAMP updated_at | TIMESTAMP

transcriptions

Transcribed audio (Deepgram STT). Keyed to episodes.show_id.

show_id | TEXT → episodes.show_id transcript | TEXT Full transcription text utterances_json | TEXT JSON: [{word, start, end, confidence}…] words_json | TEXT JSON: word-level timestamps model | TEXT Deepgram model used duration_seconds | REAL processing_time | REAL word_count | INTEGER

audio_events

Detected applause/music/intermission breaks from transcript gaps.

show_id | TEXT → episodes.show_id event_type | TEXT "applause" | "music" | "intermission" start_seconds | REAL end_seconds | REAL duration_seconds | REAL confidence_score | REAL 0–1 context_before | TEXT 120 chars before gap context_after | TEXT 120 chars after gap

scrape_log

Audit log from the Python scraper.

id, url, status (ok/fetch_error), items_found, scraped_at

5. Pages & Routes

RouteTypeDescription
/Server (dynamic)Browse all episodes. Supports ?q, ?year, ?audio, ?rundown, ?page filters. Shows card grid with search/filter bar and pagination.
/searchServer (dynamic)Deep search across all fields including rundowns. Uses multi-pass priority search: title → venue → tags → description → rundown. Displays match type badges and context snippets.
/statsServer (static)Archive statistics: total episodes, with audio, total duration, missing data, episodes per year with proportional bar chart.
/episode/[showId]Server (dynamic)Episode detail: metadata, play button, description, tags, and interactive rundown with clickable timestamp seek links. Also handles deep links.
/docsServer (static)This page — project documentation.

6. Client Components

PlayerProvider + PlayerContext (PlayerContext.tsx)

React Context provider that owns the <audio> element and exposes all playback state + controls via usePlayer(). Key states:isPlaying, isLoading, currentTime,duration, volume, muted, speed,episode. Methods: play(), pause(), seek(),skip(), setVolume(), setMuted(),setSpeed(), togglePlay().

The play() method accepts an optional seekTo parameter for starting playback at a specific position. State is persisted to localStorage every 2 seconds and restored on mount (via an effect, avoiding hydration mismatches). Keyboard shortcuts (Space, arrows, M) are registered at the provider level.

PlayerBar (PlayerBar.tsx)

Fixed-position bar at the bottom of the viewport. Shows the current episode's cover thumbnail (linked to the episode page), title, venue/date, play/pause/skip buttons, progress bar with time display, volume slider, mute toggle, and playback speed cycle (0.5x–2x). Shows a spinning loader when audio is loading. When no episode is selected, displays “Select an episode to play.”

EpisodeCard (EpisodeCard.tsx)

Card in the browse grid. Shows a gradient cover with episode title text, a play overlay button (only if the episode has audio), and card info (title, year, venue). The card body links to /episode/[showId] via next/link. The play button uses usePlayer().play() — if the episode is currently playing, the overlay shows a pause icon and gets an .playing class. Uses suppressHydrationWarning to handle localStorage-based player state.

RundownContent (RundownContent.tsx)

Renders the episode rundown with interactive timestamp links. Handles two rundown formats:

  • New format (506 episodes): HTML from the Python patcher with<a data-seek="SECONDS" href="#seek-HH:MM:SS"> links. Passed through as-is.
  • Old format (394 episodes): Plain text with MM:SSand H:MM:SS timestamps at line starts. A two-pass regex (H:MM:SS first, then MM:SS) wraps each timestamp in a<a data-seek="…" href="#seek-HH:MM:SS"> link.

Deep link handling: On mount, reads window.location.hash. If it matches #seek-HH:MM:SS, starts playback at that position.

Correct episode switching: When a seek link is clicked, calls play(audioUrl, title, sub, showId, c1, c2, secs) which loads the correct episode at the right timestamp — even if a different episode was already playing.

AutoSearchForm (AutoSearchForm.tsx)

Form wrapper that auto-submits on input. Select dropdowns trigger immediate navigation via router.replace(). Text inputs debounce 400ms before navigating. Uses next/navigation router (client-side) so the audio player is never interrupted. Prevents full browser form submission.

ThemeToggle (ThemeToggle.tsx)

Dark/light mode toggle. Reads initial state from localStorage, toggles thedark class on <html>, and persists the preference. The CSS variable system in globals.css switches between warm paper tones (light) and dark grays (dark) based on the .dark class.

7. REST API

All endpoints return JSON. Parameters are passed as query strings.

EndpointParametersReturns
GET /api/episodes?page ?q ?year ?audio ?rundown{ episodes[], total, page, totalPages, years[] }
GET /api/episodes/:showIdEpisode row (or 404)
GET /api/search?q ?limit=50{ results[], years[] }
GET /api/stats{ total, withAudio, totalSecs, noDuration, byYear[], years[] }

8. Audio Player Architecture

The player is built around a React Context pattern to ensure uninterrupted playback across page navigations:

  1. PlayerProvider wraps the entire app in layout.tsx. Since layouts never unmount during client-side navigation, the context and the <audio> element stay alive.
  2. usePlayer() hook is consumed by PlayerBar,EpisodeCard, EpisodeClient, and RundownContent.
  3. State (volume, muted, speed, current episode, playback position) is persisted to localStorage every 2 seconds and restored after mount via a useEffect — this avoids hydration mismatches (SSR always sees defaults).
  4. play() accepts an optional seekTo seconds parameter. If the same track is already loaded, it just seeks. If a different track, it loads the new source and seeks after the canplay event.
  5. Keyboard shortcuts (Space, ←→, M) are registered globally with focus-element checks to avoid interfering with text inputs.
  6. isLoading state tracks loadstart, waiting,seeking events (→ true) and canplay, play,seeked, error events (→ false). The PlayerBar shows a spinning SVG when loading.

9. Rundown Timestamp Seeking

Rundowns come in two formats from the database:

New Format (HTML from patcher)

<a class="seek-link" data-seek="3882" href="#seek-01:04:42">01:04:42</a> GK talks...

Already has data-seek (seconds) and href (shareable URL).

Old Format (plain text)

00:00 Logo<br/> 59:16 Welcome back<br/> 1:10:43 Puccini Mouth Wash<br/> 100:37 I SAW STARS<br/>

Timestamps use MM:SS (e.g., 59:16, 100:37) until 1 hour, then switch to H:MM:SS (e.g., 1:10:43). A two-pass regex on the client wraps them in seek links before rendering:

  1. Pass 1: Match H:MM:SS timestamps (1–2 digit hours, 2 digit minutes and seconds). Convert to total seconds.
  2. Pass 2: Match MM:SS timestamps (1–3 digit minutes, 2 digit seconds). Convert MM to HH:MM for the href, compute total seconds for data-seek.

Each generated link gets data-seek="SECONDS" (for the click handler) and href="#seek-HH:MM:SS" (for right-click → Copy Link and deep links).

The click handler e.preventDefault()s the link navigation, updateswindow.location.hash for sharing, and callsplay(audioUrl, …, secs) to load the correct episode at the right position.

11. Theme System

Styling uses Tailwind CSS v4 with project-specificCSS custom properties for theming. No tailwind.config.ts — Tailwind v4 uses CSS-based configuration via @import "tailwindcss".

Light/dark mode is controlled by the .dark class on <html>:

:root { --bg: #fef9f1; /* warm paper */ --accent: #8f4e0e; /* warm copper */ /* ...light theme... */ } .dark { --bg: #1a1a1a; --accent: #d4944a; /* ...dark theme... */ }

The ThemeToggle component reads localStorage on mount, sets the initial class, and toggles on click. Fonts (Inter, EB Garamond, Public Sans) are loaded via next/font/google with CSS variable fallbacks (--font-body, --font-display,--font-label).

12. Python Data Pipeline

All scripts live in scripts/ and read/write the same SQLite database.

ScriptPurpose
db_config.pyShared path helper — returns data/prairiehome.db
scrape_prairiehome.pyInitial scraper — creates the episodes table, scrapes listing + detail pages
patch_prairiehome.pyEnriches episodes — fetches page_title, rundown_url, rundown_content. --migrate flag adds data-seek links to existing rundowns
download_audio.pyDownloads MP3 files from audio_url to local storage
transcribe_episodes.pySpeech-to-text via Deepgram API — creates/updates transcriptions table
detect_audio_events.pyDetects applause/music/intermission breaks from transcript utterance gaps
train_segment_boundaries.pyML model — predicts segment boundaries for episodes without rundowns
app.pyOriginal Flask app — kept for reference

13. Deployment

The app is configured with output: "standalone" innext.config.ts. Building produces a self-contained deployment at .next/standalone/:

.next/standalone/ ├── server.js # Entry point — node server.js ├── package.json # Minimal manifest ├── node_modules/ # Traced runtime dependencies ├── data/ │ └── prairiehome.db # Copied automatically ├── public/ # Static assets (copy manually) └── .next/static/ # Compiled CSS/JS (copy manually)

Quick Deploy

# 1. Build + bundle ./deploy.sh ./deploy # 2. Copy to server rsync -avz deploy/ user@your-server:/srv/prairiehome/ # 3. Run cd /srv/prairiehome PORT=3000 node server.js

Process Management (pm2)

npm install -g pm2 pm2 start /srv/prairiehome/server.js --name prairiehome -i max pm2 save pm2 startup

Nginx Reverse Proxy

server { server_name your-domain.com; location / { proxy_pass http://127.0.0.1:3000; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } listen 80; }

Requirements

  • Node.js 18+ on the server
  • ~180 MB disk space (DB + node_modules + static assets)
  • Server architecture must match the build machine for better-sqlite3 native binary (or run npm rebuild better-sqlite3 on the server)

14. Development Commands

npm run dev # Start dev server (localhost:3000) npm run build # Production build → .next/standalone/ npm run lint # ESLint check npm start # Start production server (after build) ./deploy.sh # Build + bundle deployment directory # Utility — query the database directly sqlite3 data/prairiehome.db "SELECT ..."
Select an episode to play