GitHub - aslanbayli/multilingual-music-recommendation-model

Multi-Lingual Music Recommendation System

This project is all about breaking down language barriers in music discovery. Our goal is to build a recommendation system that accept a song as an input and return a playlist of songs similar to the input song but in users's choice of language. The project is similar to Spotify's "go to radio" feature but instead of matrix factorization based on user's listening history, we are creating the playlist using Neural Networks and an input song. We use a mix of rich data sources and specialized models to analyze everything from the instruments used and micro-genres to the deeper lyrical concepts of a song. By automating the whole process from data collection to model training and deployment our system is both scalable and continuously improving.

Contributors

Name	Role	Commit Contributions
Ali Aslanbayli	Model Training	80a21e481dc1e71dbf5df70b4cc3c4f67d294f8c
Farid Taghiyev	Continuous Pipeline Management	23366c4a41c298ac869ba19c9bc10f9351d39b44
Nevin Mathews Kuruvilla	Model Serving and Monitoring	4446b9532a2bb7132299d2da55712eef27b6b261
Rohan Subramaniam	Data Pipeline Development	bd6e2a5

System Diagram

Our system is designed to integrate several key components:

Data Ingestion: We gather information using the Spotify API, the Million Playlist Dataset, and additional insights from MusicBrainz and Last.fm.
Feature Engineering: The system extracts detailed features like instruments, specific micro genres, and the lyrical concepts behind each song.
Model Training & Evaluation: We combine multilingual text models with audio similarity techniques to train our recommendation engine.
Model Serving & Monitoring: Once the model is trained, it’s deployed as a service that’s constantly monitored and updated to ensure top performance.

Summary of Outside Materials

We rely on a diverse set of resources to build our system:

Data/Model	How It Was Created	Usage Conditions
Spotify API	Offers detailed track info and audio features using the Get Tracks route (Docs).	Use is governed by Spotify Developer Terms.
Million Playlist Dataset	A vast collection of playlists that gives us insight into user preferences (AICrowd Challenge).	Usage follows competition and data sharing guidelines.
MusicBrainz & Last.fm APIs	Provide additional metadata and user insights to enhance our track information (MusicBrainz, Last.fm API).	Open for research with their respective API policies.
Multilingual Lyrics Model	Pre-trained models like LaBSE (LaBSE Docs) are used to understand lyrics across languages.	Open source under their licenses.
Audio Similarity	FAISS helps us quickly search through audio features (FAISS GitHub, Website).	Open source.
Collaborative Filtering	We explore ALS-based matrix factorization using the implicit library (Implicit GitHub) to capture playlist co-occurrence patterns.	Open source, based on academic research (Yifan Hu Paper).
Additional Libraries	Annoy is used for fast, approximate nearest neighbor searches (Annoy GitHub).	Open source.

Summary of Infrastructure Requirements

We’ve carefully planned our infrastructure to support the project efficiently:

Requirement	Usage	Why It’s Needed
`m1.medium` VMs	3 instances running throughout the project lifecycle.	They provide a good balance of compute for daily tasks.
`compute_liqid`	Scheduled 4-hour blocks twice a week for heavy training tasks.	Speeds up the training of our embedding and audio feature models.
Floating IPs	1 permanent IP for the main API endpoint, plus additional IPs for testing.	Ensures steady access to our service for users.
Persistent Storage	Cloud storage for our datasets, model checkpoints, and logs.	Keeps our work reproducible and our data readily accessible.

Detailed Design Plan

Model Training and Training Platforms

We’re using a hybrid approach to model training. Our strategy combines:

Text-based Features: We use robust models like LaBSE to understand the lyrics in multiple languages.
Audio-based Features: FAISS helps us perform efficient similarity searches based on audio features.
Collaborative Filtering: We’re exploring ALS with the implicit library to uncover relationships from playlist co-occurrence data.

This approach ensures we capture both the semantic meaning of lyrics and the unique audio characteristics of each track. It’s designed to offer recommendations that truly resonate with diverse audiences.

Model Serving and Monitoring Platforms

Once our models are trained, they’re deployed as containerized microservices. Our deployment strategy includes:

RESTful API Endpoints: To serve recommendations quickly and reliably.
Continuous Monitoring: To keep an eye on key metrics like response time and model performance.
Version Control: So we can easily update or roll back models as needed.

This setup guarantees that our system is always responsive, easy to maintain, and capable of evolving with new data.

Data Pipeline

Our data pipeline is built to handle everything from data collection to feature extraction:

Data Ingestion: We pull in data from various APIs (Spotify, MusicBrainz, Last.fm) and the Million Playlist Dataset.
Feature Engineering: We focus on extracting detailed information such as instruments, micro genres, and lyrical themes.
Processing & Storage: Both real-time and batch processes are used to clean, transform, and store the data efficiently.

This pipeline is key to keeping our recommendation engine up-to-date and accurate.

Continuous X

To keep everything running smoothly, we have Continuous Integration, Continuous Training, and Continuous Deployment in place:

Automated Testing: Ensures every change is validated.
Scheduled Retraining: Models are updated based on new data and performance feedback.
Deployment Automation: New models and features are rolled out seamlessly, reducing downtime and manual intervention.

This continuous approach helps us maintain a system that’s both robust and ready for rapid improvement.

Implementation

CONTINUOUS X

Infrastructure and Infrastructure-as-Code

Our infrastructure is managed using Terraform, with configurations located in the infrastructure/terraform_configs directory. The infrastructure setup includes:

KVM-based virtualization setup for model training and deployment
Kubernetes cluster configuration for container orchestration
Network and security configurations

Key files:

infrastructure/terraform_configs/kvm/ - Contains KVM virtualization configurations
infrastructure/terraform_manage.sh - Script for managing Terraform operations

Staged Deployment

Our deployment process follows a staged approach using ArgoCD for GitOps-based deployments. The process includes:

Training Stage
- Model training workflow: infrastructure/workflows/train-model.yaml
- Initial build workflow: infrastructure/workflows/build-initial.yaml
Staging Deployment
- Staging configuration: infrastructure/ansible_configs/argocd/argocd_add_staging.yml
- Container build workflow: infrastructure/workflows/build-container-image.yaml
Canary Deployment
- Canary configuration: infrastructure/ansible_configs/argocd/argocd_add_canary.yml
- Model promotion workflow: infrastructure/workflows/promote-model.yaml
Production Deployment
- Production configuration: infrastructure/ansible_configs/argocd/argocd_add_prod.yml
- Container deployment workflow: infrastructure/workflows/deploy-container-image.yaml

CI/CD and Continuous Training

Our CI/CD pipeline is implemented using GitHub Actions and ArgoCD, with the following components:

Continuous Training Triggers
- New data availability
- Model performance degradation
- Scheduled retraining
- Manual trigger
Training to Deployment Pipeline
- Training workflow: infrastructure/workflows/train-model.yaml
- Model evaluation and promotion: infrastructure/workflows/promote-model.yaml
- Container build and deployment: infrastructure/workflows/build-container-image.yaml and infrastructure/workflows/deploy-container-image.yaml
GitOps-based Deployment
- ArgoCD configurations in infrastructure/ansible_configs/argocd/
- Workflow templates: infrastructure/ansible_configs/argocd/workflow_templates_apply.yml
- Initial build workflow: infrastructure/ansible_configs/argocd/workflow_build_init.yml

The entire process is automated, with each stage having its own validation and rollback mechanisms to ensure safe and reliable deployments.

DATA PIPELINE

The data pipeline, developed by Rohan Subramaniam, handles multilingual music data collection, processing, and storage for training and inference. It is designed to process slices from the Million Playlist Dataset and generate multilingual training pairs based on lyric similarity.

Key components of the pipeline:

Playlist Slice Processing: Efficiently reads batches of playlists from the Million Playlist Dataset and extracts key metadata including track_uri, artist_name, and track_name.
Lyrics Fetching: Uses the OVH Lyrics API to fetch lyrics for each track. Tracks with no lyrics or insufficient word counts are discarded to ensure quality.
Language Detection: Implements fast-langdetect to detect language from fetched lyrics. English tracks are deduplicated to prevent overrepresentation.
Positive Pair Generation: Uses LaBSE embeddings to compute pairwise similarity and generate high-quality positive pairs across multiple languages.
Track Embeddings: Encodes lyrics using the LaBSE model and stores the results as track_to_embedding mappings.
Storage Integration: Outputs (positive pairs, track metadata, and embeddings) are uploaded to MinIO buckets for persistent object storage and later retrieval by model training and serving components.
Batch Execution Support: Designed to run on separate slices and resume where it left off, ensuring no overwrite or data leakage across training batches.

Code and Scripts

Multilingual_Data_pipeline.ipynb: Main Jupyter notebook used for data processing and lyric-based pair generation.
scripts/: Shell scripts used to upload data to MinIO and Chameleon block persistent volumes.

Pipeline Flow Diagram

This diagram summarizes the full data processing pipeline:

Sample Positive Pair

This image shows an example of a multilingual lyric pair used in contrastive training:

MODEL TRAINING

Modeling: Since we are building a recommendation system, we decided to fine-tune an encoder model that will group similar songs and make it easier for us to pick the songs that are most similar to the given input song. Because we don't have ground truth labels, we use Contrastive Learning by providing positive and negative pairs of lyrics as the input during fine-tuning. When it comes to the choice of the encoder model, we went ahead with LaBSE (Language Agnostic BERT Sentence Encoder) as it has been shown to perform better for similarity search tasks and multi-lingual inputs. Another benefit of LaBSE is its relatively small size of 500M parameters, because speed is of significant importance to us, as we don't want the user to wait for long to get music recommendations.
Train and re-train: There are two files used for training - one for training the model as a standalone process and one for training using Ray train scheduler. A good place to start would be looking into train/src/train_model_mlflow.py. A sample output from a train run can be found in train/nohup_train_mlflow.txt
Experiment tracking: Experiment tracking is done through MLflow. All of the model training jobs are logged are logged to the MLflow dashboard including their weights and loss plots. The best is then saved to the model registry for inference purposes.

Scheduling training jobs: Training andre-training is done via automated shell scripts found in the train/Makefile
Optional: Although we have not implemented a multi-gpu training due to our managable model size, post fine-tuning the model is quantized to FP16 to make it faster during storage to MLFlow (MinIO).
Optional: Ray train was implemented to shcedule hyper-parameter tuning train jobs and can be found in train/src/train_model_ray.py. There are also relevant docker compose files that set up the Ray worker containers in train/docker/docker-compose-ray-cuda.yaml. A sample output from a Ray train job can be found in train/nohup_train_ray.txt

MODEL SERVING AND EVALUATION

API Endpoints

The API is built using FastAPI and is connected to a frontend interface. The backend pulls the latest model from MLflow for inference.

1. `/recommend-playlist`

Input: Spotify track URL, target language(s), number of recommended tracks
Function: Returns a list of similar songs in the specified language(s)

2. `/feedback`

Triggered by: User clicking the "thumbs down" button
Function: Logs feedback about poor recommendations for model improvement

Requirements Identification

As of the end of 2024, Spotify reported 675 million monthly active users, including 263 million paying subscribers. Given this extensive user base, the system is designed to handle moderate to large-scale usage, supporting thousands of users with low-latency responses. Recommendations are intended to be near real-time, with the ability to handle concurrent API requests efficiently.

It must support:

High-throughput model inference served via Triton or FastAPI endpoints.
Feedback collection for continuous learning and personalization.
Robust offline and online evaluation for monitoring model performance over time.

Model Optimization

The model optimization pipeline is implemented in: optimization_pipeline.py Run it using: python -m model_optimizations.run_optimization

CPU optimization results are stored in: v1
GPU optimization results are stored in: v2

System Optimization

System benchmarking is implemented in: benchmark.py Run it using: python -m server_optimizations.benchmark.benchmark.py

Benchmark results are saved in: benchmark_results.md
Triton inference script is located in: labse_model

Offline Evaluation

Offline model evaluation is implemented in the tests directory and can be run using: pytest tests/ -v

Online Evaluation (Feedback Loop)

The system includes a feedback mechanism that allows users to report poor recommendations by clicking a thumbs down icon. This feedback generates a negative_pair file with:

anchor: lyrics of the input track
negative: lyrics of the selected (bad) recommendation

These negative pairs are used in contrastive training to improve future recommendations.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
infrastructure		infrastructure
model_optimizations		model_optimizations
scripts		scripts
server_optimizations		server_optimizations
src		src
tests		tests
train		train
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Multilingual_Data_pipeline.ipynb		Multilingual_Data_pipeline.ipynb
README.md		README.md
docker-compose.yaml		docker-compose.yaml
prometheus.yml		prometheus.yml

License

aslanbayli/multilingual-music-recommendation-model

Folders and files

Latest commit

History

Repository files navigation