This is an exploration into using AI capabilities to interactively
explore archived podcast content.
Overcast is a popular iOS podcast player that
exposes some of its data to users. I (crossjam) also use and
recommend Overcast.
retrocast work began as a clone of Harold Martin’s
overcast-to-sqlite
providing a foundation for pulling podcast information from my
Overcast account. retrocast honors the Apache 2.0 license from
overcast-to-sqlite.
Save listening history and feed/episode info from Overcast to a SQLite database. Try exploring your podcast listening habits with Datasette!
- How to install
- Authentication
- Fetching and saving updates
- Extending and saving full feeds
- Downloading transcripts
- Episode Download Database
# Install uv -- https://docs.astral.sh/uv/
$ uvx git+https://github.com/crossjam/retrocast --help
$ pipx install git+https://github.com/crossjam/retrocast
# or
$ uv tool install git+https://github.com/crossjam/retrocast
Or to upgrade:
$ pip install --upgrade git+https://github.com/crossjam/retrocast
# or
$ uv tool upgrade git+https://github.com/crossjam/retrocast
Run this command to login to Overcast (note: neither your password nor email are saved, only the auth cookie):
$ retrocast sync overcast auth
This will create a file called auth.json in an XDG conformant
platform user directory containing the required value. To save the
file at a different path or filename, use the --auth=myauth.json
option.
If you do not wish to save this information you can manually download the "All data" file from the Overcast account page and pass it into the save command as described below.
The save command retrieves all Overcast info and stores playlists,
podcast feeds, and episodes in their respective tables with a primary
key overcastId.
$ retrocast save
By default, this saves to overcast.db but this can be manually set.
$ retrocast save someother.db
By default, it will attempt to use the info in auth.json file is present it will use the cookie from that file. You can point to a different location using -a:
$ retrocast save -a /path/to/auth.json
Alternately, you can skip authentication by passing in an OPML file you downloaded from Overcast:
$ retrocast save --load /path/to/overcast.opml
By default, the save command will save any OPML file it downloads adjacent to the database file in archive/overcast/. You can disable this behavior with --no-archive or -na.
For increased reporting verbosity, use the -v flag.
The extend command that will download the XML files for all feeds you are subscribed to and extract tags and attributes. These are stored in separate tables feeds_extended and episodes_extended with primary keys xmlUrl and enclosureUrl respectively. (See points 4 and 5 below for more information.)
$ retrocast extend
Like the save command, this will attempt to archive feeds to archive/feeds/ by default. This can be disabled with --no-archive or -na.
It also supports the -v flag to print additional information.
There are a few caveats for this functionality:
- The first time this is invoked will require downloading and parsing
an XML file for each feed you are subscribed to. (Subsequent
invocations only require this for new episodes loaded by
save) Because this command may take a long time to run if you have many feeds, it is recommended to use the-vflag to observe progress. - This will increase the size of your database by approximately 2 MB per feed, so may result in a large file if you subscribe to many feeds.
- Certain feeds may not load due to e.g. authentication, rate limiting, or other issues. These will be logged to the console and the feed will be skipped. Likewise, an episode may appear in your episodes table but not in the extended information if it is no longer available.
- The
_extendedtables use URLs as their primary key. This may potentially lead to unjoinable / orphaned episodes if the enclosure URL (i.e. URL of the audio file) has changed since Overcast stored it. - There is no guarantee of which columns will be present in these tables aside from URL, title, and description. This command attempts to capture and normalize all XML tags contained in the feed so it is likely that many columns will be created and only a few rows will have values for uncommon tags/attributes.
Any suggestions for improving on these caveats are welcome, please open an issue!
The transcripts command that will download the transcripts if available.
The save and extend commands MUST be run prior to this.
Episodes with a "podcast:transcript:url" value will be downloaded from that URL and the download's location will then be stored in "transcriptDownloadPath".
$ retrocast transcripts
Like previous commands, by default this will save transcripts to archive/transcripts/<feed title>/<episode title> by default.
A different path can be set with the -p/--path flag.
It also supports the -v flag to print additional information.
There is also a -s flag to only download transcripts for starred episodes.
The episode download database feature allows you to index and search podcast episodes downloaded via podcast-archiver. This creates a searchable database of your downloaded episode collection.
Use the download podcast-archiver command to download podcast episodes:
$ retrocast download podcast-archiver --feed https://example.com/feed.xml
By default, episodes are downloaded to ~/.local/share/net.memexponent.retrocast/episode_downloads/ (or equivalent on your platform) with .info.json metadata files created automatically.
For more options, see:
$ retrocast download podcast-archiver --help
Initialize the episode database (one-time setup):
$ retrocast download db init
Scan your downloaded episodes and populate the database:
$ retrocast download db update
This will:
- Discover all media files in your downloads directory
- Extract metadata from
.info.jsonfiles - Index episode titles, descriptions, and show notes for full-text search
- Track file locations, sizes, and timestamps
Options:
--rescan: Delete existing records and rebuild from scratch--verify: Check for missing files and mark them in the database
Search your downloaded episodes using full-text search:
$ retrocast download db search "machine learning"
$ retrocast download db search "python" --podcast "Talk Python To Me"
$ retrocast download db search "interview" --limit 10
The search looks across:
- Episode titles
- Descriptions
- Summaries
- Show notes
- Podcast titles
Results are displayed in a formatted table with episode details.
Complete workflow for downloading and indexing podcasts:
# One-time setup
retrocast download db init
# Download episodes (creates .info.json files automatically)
retrocast download podcast-archiver --feed https://example.com/feed.xml
# Index the downloaded episodes
retrocast download db update
# Search your collection
retrocast download db search "topic you're interested in"Downloaded episodes are stored in the episode_downloads table within retrocast.db with the following information:
- Media file path and metadata
- Episode title, description, summary, and show notes
- Publication date and duration
- Full
.info.jsonmetadata as JSON - File existence tracking
Full-text search is enabled via SQLite FTS5 for fast searching across all text fields.
Full disclosure, this project is primarily "auditionware". The main goal is to provide something for potential external collaborators or employers to view and review. Yup, it’s a bit about me showing off. If you have strong opinions feel free to fork this sucker and take it where your heart desires.
However, pull requests are welcome, at least as criticism, feedback, and inspiration! There might be a lag on responding or acceptance though. For major changes, please open an issue first to discuss what you would like to change.
git clone https://github.com/crossjam/retrocast.git
cd retrocast
uv sync
uv run retrocast all -vThis project uses PoeThePoet for task automation. Available tasks:
# Run all QA checks (lint, type check, test)
uv run poe qa
# Individual tasks
uv run poe lint # Run ruff linter
uv run poe lint:fix # Run ruff and auto-fix issues
uv run poe type # Run ty type checker
uv run poe test # Run pytest with verbose output
uv run poe test:cov # Run pytest with coverage report
uv run poe test:quick # Run pytest and stop on first failure
# List all available tasks
uv run poe --helpThis project is linted with ruff and uses Black code formatting.
- Brian M. Dennis - bmd at bmdphd dot info
- Harold Martin - harold.martin at gmail