Cluster your Spotify library using interpretable audio features and GPT-powered lyric analysis. Produces a 33-dimensional interpretable feature vector where every dimension has human-readable meaning.
For the full methodology, design decisions, and technical deep-dives, see the accompanying essay.
Time estimate: ~3-4 hours for 1,500 songs (mostly waiting: downloads, audio extraction, GPT API calls). All steps cache progress, so you can stop/resume.
- Extracts your Spotify library w/ Metadata
- Downloads your Spotify saved songs as MP3s
- Fetches lyrics from Genius + MusixMatch
- Extracts audio features via Essentia (genre, mood, energy, etc.)
- Classifies lyrics via GPT (valence, themes, explicit content, etc.)
- Clusters songs into meaningful groups
- Visualizes with interactive 3D UMAP
- Python 3.9+
- FFmpeg (
brew install ffmpeg/apt install ffmpeg) - ~2GB disk for Essentia models
Clone and install dependencies.
git clone https://github.com/yourusername/spotify-clustering.git
cd spotify-clustering
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtYou'll need Spotify (to fetch your library), Genius (for lyrics), and OpenAI (for lyric classification).
# .env file
# Spotify - https://developer.spotify.com/dashboard
# Set redirect URI to http://127.0.0.1:3000/callback
SPOTIFY_CLIENT_ID=...
SPOTIFY_CLIENT_SECRET=...
# Genius - https://genius.com/api-clients
GENIUS_ACCESS_TOKEN=...
# OpenAI - https://platform.openai.com/api-keys
OPENAI_API_KEY=...Pulls your saved tracks metadata from Spotify. First run opens browser for OAuth.
python spotify/fetch_spotify_saved_songs.pyDownloads MP3s for local audio analysis. Safe to stop/resume.
python songs/download_via_spotdl.py # or download_via_ytdlp.pyFetches lyrics from Genius. Also safe to stop/resume.
python lyrics/fetch_lyrics.pyExtracts audio features (Essentia) and classifies lyrics (GPT). First run is slow (~2-3 hours for 1,500 songs: ~90 min audio extraction + ~60 min GPT API calls). Uses cache afterward.
python analysis/run_analysis.py --songs songs/data/ --lyrics lyrics/data/Explore clusters, tune parameters, and visualize results.
streamlit run analysis/interactive_interpretability.pyCreates Spotify playlists from your clusters.
python export/export_clusters_as_playlists.py --dry-run # preview
python export/export_clusters_as_playlists.py # create| File | Purpose |
|---|---|
analysis/run_analysis.py |
Main entry point |
analysis/interactive_interpretability.py |
Streamlit dashboard |
analysis/pipeline/interpretable_features.py |
33-dim vector construction |
analysis/pipeline/audio_analysis.py |
Essentia feature extraction |
analysis/pipeline/lyric_features.py |
GPT lyric classification |
analysis/pipeline/config.py |
Configuration & scales |