This repository hosts a production-ready MLOps pipeline for movie recommendations based on the MovieLens 1M dataset. It demonstrates a complete lifecycle from raw data processing to model serving via REST API and an interactive dashboard.
Key Features:
- Inference Engine: SVD Model served via FastAPI (High Performance).
- Frontend: Decoupled Streamlit dashboard for user interaction.
- Reproducibility: Fully Dockerized environment using explicit volume mounts.
- Quality Assurance: Automated E2E and Unit tests via Pytest.
- Experiment Tracking: MLflow integration for model metrics.
movie-recommendation-api/
│
├── app/ # API Service (FastAPI)
│ ├── main.py # Application Entry Point
│ └── schema.py # Pydantic Data Contracts
│
├── dashboard/ # Frontend (Streamlit)
│ └── app.py # UI Logic
│
├── notebooks/ # EDA & Experiments (Jupyter)
│ └── 01-Data-Exploration.ipynb
│
├── src/ # ML Pipeline (Training & Processing)
│ ├── config.py # Hyperparameters & Paths
│ ├── train.py # Training Script (SVD + GridSearchCV)
│ └── data_processing.py # ETL & Data Transformation Logic
│
├── tests/ # Automated Test Suite
├── requirements.txt # Production Dependencies
└── Dockerfile # Container Configuration
Prerequisites
- Python 3.10+
- Docker (Optional but recommended)
- MovieLens 1M Dataset (Place in data/raw/)
We recommend using a fresh virtual environment to avoid dependency conflicts.
# Clone the repository
git clone https://github.com/enesgulerml/movie-recommendation-api.git
cd movie-recommendation-api
# Create Environment
conda create -n movie-rec-sys python=3.10
conda activate movie-rec-sys
# Install Dependencies
pip install -r requirements.txt
pip install -e .Since the trained model files are not included in the repository (due to size limits), you must train the model locally first.
This pipeline processes the raw data, trains the SVD model, and saves the artifacts to the models/ directory.
# Run the training pipeline
python -m src.train✅ Success: Check that models/recsys_svd_model.pkl has been created.
Once the model is trained, use Docker to serve the API. We mount your local models/ folder so the container can access the model you just created.
- Build the Image:
docker build -t recsys-api:latest .- Run the Container:
docker run -d --rm -p 8000:80 \
-v "$(pwd)/models:/app/models" \
-v "$(pwd)/data:/app/data" \
recsys-api:latest👉 Access API Docs: http://localhost:8000/docs
To launch the interactive frontend (ensure API is running first):
streamlit run dashboard/app.py
👉 Access Dashboard: http://localhost:8501
The project includes a robust test suite to ensure data integrity and API availability.
# Run all tests
pytest
# Run only fast tests (skip integration)
pytest -m "not slow"