End-to-end MLOps project with Docker, Kubernetes, and AWS
π What I'm Building: A production-ready, enterprise-grade MLOps platform with:
β
Phase 1: ML Pipeline & Training
β
Phase 2: API & Streamlit UI
β
Phase 3: Docker Containers
β
Phase 4: Testing Suite
β
Phase 5: CI/CD
β
Phase 6: Kubernetes Deployment (AWS)
- Predict customer churn to enable proactive retention strategies, reducing customer attrition by 15% and improving customer lifetime value.
What problem are we solving?
- Suggestion: Let's build a Customer Churn Prediction System (simple, practical, demonstrates full MLOps pipeline)
What's the impact?
- Business value: Reduce customer churn by 15%
- Target: Marketing team can proactively reach at-risk customers
What data do we have?
- Customer demographics (age, location, tenure)
- Usage patterns (login frequency, feature usage)
- Transaction history (revenue, plan type)
Data size & freshness?
- For demo: Use Kaggle dataset (Telco Customer Churn)
- In production: Daily batch updates from database
What's good enough?
Target Metrics:
- Recall β₯ 80% (catch most churners)
- Precision β₯ 70% (avoid too many false alarms)
- F1-Score β₯ 0.75
- AUC-ROC β₯ 0.85
Model type?
- Start: Logistic Regression (baseline)
- Experiment: Random Forest, XGBoost, LightGBM
- Latency: < 200ms for predictions (REST API)
- Throughput: Handle 100 requests/second
- Availability: 99.5% uptime
- Python 3.10+
- pip
- Git
git clone https://github.com/FraidoonOmarzai/Chunk_Prediction-MLOps-.git
cd Chunk_Prediction-MLOps-# Create virtual environment
conda create -p ./venv python=3.11 -y
# Activate virtual environment
conda activate C:\Users\44787\Desktop\Chunk_Prediction-MLOps-\venv
pip install -r requirements.txttouch template.py
python3 template.py
β
Data ingestion & validation
β
Feature engineering
β
Model training (4 algorithms)
β
MLflow experiment tracking
β
Model evaluation & selection
python scripts/download_data.py- Visit: https://www.kaggle.com/datasets/blastchar/telco-customer-churn
- Download the dataset
- Save as:
data/raw/churn_data.csv
# Install Kaggle
pip install kaggle
# Set up credentials (~/.kaggle/kaggle.json)
# Then run:
python scripts/download_data.py- Prototype in Jupyter Notebook
-
You explore ideas, try models, test functions, visualize results, etc.
-
This is the βplaygroundβ phase.
- Move Stable Code to Python Scripts (.py)
- Once your experiment code is working in the notebook, you usually move the clean, reusable parts into Python files.
-
Components:
src/components/data_ingestion.pysrc/components/data_validation.pysrc/components/data_preprocessing.pysrc/components/model_trainer.pysrc/components/model_evaluation.py
-
Pipelines:
src/pipeline/training_pipeline.py
-
Utilities:
scripts/train.py
3. Run Experiments From Scripts
- Running .py files is better for:
- long training jobs
- large experiments
- automated logging
- reproducibility
## π― Run Training Pipeline
### Execute Complete Pipeline
```bash
python scripts/train.py
- Data Ingestion: Loads and splits data (80/20)
- Data Validation: Checks schema and quality
- Data Preprocessing: Cleans and transforms features
- Model Training: Trains 4 models with MLflow tracking
- Model Evaluation: Compares models and selects best
======================================================================
TRAINING PIPELINE COMPLETED SUCCESSFULLY!
======================================================================
Best Model: xgboost
Models trained: 4
Preprocessor saved at: artifacts/preprocessors/preprocessor.pkl
Check MLflow UI for detailed experiment tracking:
Run: mlflow ui
Open: http://localhost:5000
======================================================================
mlflow uiOpen browser: http://localhost:5000
- All experiment runs
- Parameters for each model
- Metrics (accuracy, precision, recall, F1, ROC-AUC)
- Model artifacts
- Comparison charts
- Recall: β₯ 80% (catch most churners)
- Precision: β₯ 70% (avoid false alarms)
- F1-Score: β₯ 0.75
- ROC-AUC: β₯ 0.85
- Accuracy
- Precision, Recall, F1-Score
- ROC-AUC
- Confusion Matrix
- Specificity, Sensitivity
- Classification Report
# Check evaluation report
cat artifacts/metrics/evaluation_report.json
# Check validation report
cat artifacts/validation_report.json- Data paths
- Train/test split ratio
- Feature lists
- Artifact locations
- MLflow settings
- Hyperparameters for each model
- Algorithm-specific settings
- Training parameters
Logs are saved in: logs/
[2024-11-04 10:30:45] INFO - ChurnPrediction - Starting training pipeline
[2024-11-04 10:30:46] INFO - ChurnPrediction - Data loaded: (7043, 21)
artifacts/
βββ models/
β βββ logistic_regression.pkl
β βββ random_forest.pkl
β βββ xgboost.pkl
β βββ lightgbm.pkl
βββ preprocessors/
β βββ preprocessor.pkl
β βββ preprocessor_label_encoder.pkl
βββ metrics/
β βββ evaluation_report.json
βββ validation_report.json
- Logistic Regression (Baseline)
- Random Forest (Ensemble)
- XGBoost (Gradient Boosting)
- LightGBM (Fast Gradient Boosting)
- Stratified train/test split (80/20)
- Standard scaling for numerical features
- One-hot encoding for categorical features
- Automated hyperparameter configuration
- MLflow tracking for all experiments
β
FastAPI REST API
β
Streamlit dashboard
β
Real-time predictions
β
Batch processing
Phase 2 adds FastAPI REST API and Streamlit Dashboard for real-time predictions and interactive visualizations.
- Loads trained models for inference
- Handles single and batch predictions
- Calculates risk levels
- Feature importance extraction
- RESTful endpoints for predictions
- Request/response validation with Pydantic
- Auto-generated API documentation
- Health checks and monitoring
- Interactive web interface
- Single customer prediction
- Batch CSV upload
- Visualizations and analytics
pip install --upgrade pip
pip install fastapi uvicorn[standard] streamlit plotly python-multipartOr install from updated requirements.txt:
pip install -r requirements.txt# If you haven't trained models yet
python scripts/train.pypython run_api.pyAPI will be available at:
- Main API: http://localhost:8000
- Interactive Docs: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
python run_streamlit.pyDashboard will open at: http://localhost:8501
GET http://localhost:8000/healthResponse:
{
"status": "healthy",
"model_loaded": true,
"preprocessor_loaded": true,
"api_version": "1.0.0"
}POST http://localhost:8000/predictRequest Body:
{
"customer": {
"gender": "Female",
"SeniorCitizen": 0,
"Partner": "Yes",
"Dependents": "No",
"tenure": 12,
"PhoneService": "Yes",
"MultipleLines": "No",
"InternetService": "Fiber optic",
"OnlineSecurity": "No",
"OnlineBackup": "Yes",
"DeviceProtection": "No",
"TechSupport": "No",
"StreamingTV": "Yes",
"StreamingMovies": "No",
"Contract": "Month-to-month",
"PaperlessBilling": "Yes",
"PaymentMethod": "Electronic check",
"MonthlyCharges": 70.35,
"TotalCharges": 840.5
}
}Response:
{
"prediction": "Yes",
"prediction_label": 1,
"churn_probability": 0.7245,
"no_churn_probability": 0.2755,
"confidence": 0.7245,
"risk_level": "High"
}POST http://localhost:8000/predict/batchRequest Body:
{
"customers": [
{
/* customer 1 data */
},
{
/* customer 2 data */
}
]
}Response:
{
"predictions": [
/* array of predictions */
],
"total_customers": 2,
"high_risk_count": 1
}GET http://localhost:8000/model/infoGET http://localhost:8000/model/feature-importance- Start API server:
python run_api.py - Open browser: http://localhost:8000/docs
- Try out endpoints directly in the browser
python test_api.py# Health check
curl http://localhost:8000/health
# Single prediction
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d @sample_request.jsonimport requests
response = requests.post(
"http://localhost:8000/predict",
json={"customer": {/* customer data */}}
)
print(response.json())β
Docker images (API, Streamlit, Training)
β
Docker Compose orchestration
β
Multi-stage builds
β
Pushed to Docker Hub
Phase 3 containerizes the entire application using Docker, enabling consistent deployments across any environment.
- API Image - FastAPI service
- Streamlit Image - Web dashboard
- Training Image - Model training pipeline
docker/Dockerfile.api- API container definitiondocker/Dockerfile.streamlit- Streamlit containerdocker/Dockerfile.training- Training containerdocker-compose.yml- Multi-container orchestration.dockerignore- Exclude unnecessary files.env.example- Environment template- Build & push scripts
mlops-churn-prediction/
β
βββ docker/
β βββ Dockerfile.api # NEW - API container
β βββ Dockerfile.streamlit # NEW - Streamlit container
β βββ Dockerfile.training # NEW - Training container
β
βββ docker-compose.yml # NEW - Orchestration
βββ .dockerignore # NEW - Exclude files
βββ .env.example # NEW - Environment template
β
βββ scripts/
βββ build_images.sh/bat # NEW - Build script
βββ push_images.sh/bat # NEW - Push to Docker Hub
βββ run_docker.sh/bat # NEW - Run containers (.sh for Mac/Linux)# Install Docker Desktop
# Download from: https://www.docker.com/products/docker-desktop
# Verify installation
docker --version
docker-compose --version# Edit .env file
# Set DOCKER_USERNAME to your Docker Hub username
nano .env # or use your favorite editorExample .env:
DOCKER_USERNAME=yourusername
VERSION=v1.0.0chmod +x scripts/build_images.sh
chmod +x scripts/push_images.sh
chmod +x scripts/run_docker.shLinux/Mac:
./scripts/build_images.shWindows:
scripts\build_images.batThis will build all 3 Docker images (~5-10 minutes first time).
# Start all services
docker-compose up -d
# Or use the helper script
./scripts/run_docker.sh start- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Streamlit: http://localhost:8501
- MLflow: http://localhost:5000
ββββββββββββββββββββββββββββββββββββββββββββ
β Docker Compose Network β
β β
β ββββββββββββ ββββββββββββ βββββββββ β
β β API β βStreamlit β βMLflow β β
β β :8000 β β :8501 β β :5000 β β
β ββββββ¬ββββββ ββββββ¬ββββββ βββββ¬ββββ β
β β β β β
β βββββββββββββββ΄ββββββββββββββ β
β Shared Network β
ββββββββββββββββββββββββββββββββββββββββββββ
β β
[Volumes] [Volumes]
artifacts/ mlflow/
logs/ data/
ββββββββββββββββββββββββ
β Dockerfile.training β ββbuildββ> π³ app-image
ββββββββββββββββββββββββ
βββββββββββββββββββββββββ
β Dockerfile.streamlit β ββbuildββ> π³ db-image
βββββββββββββββββββββββββ
ββββββββββββββββββββ
β Dockerfile.api β ββbuildββ> π³ api-image
ββββββββββββββββββββ
β
β
ββββββββββββββββββββ
β docker-compose β ββorchestratesββ> π All containers running together
ββββββββββββββββββββ
- Go to https://hub.docker.com
- Sign up for free account
- Create access token (Settings β Security β New Access Token)
docker login
# Enter username and password/tokenLinux/Mac:
./scripts/push_images.shWindows:
scripts\push_images.batManual Push:
docker push username/churn-prediction-api:latest
docker push username/churn-prediction-streamlit:latest
docker push username/churn-prediction-training:latestVisit: https://hub.docker.com/u/yourusername
Edit docker-compose.yml:
services:
api:
ports:
- "8080:8000" # Change 8080 to your portservices:
api:
environment:
- CUSTOM_VAR=value
- LOG_LEVEL=DEBUGEdit .env:
MODEL_PATH=artifacts/models/random_forest.pklservices:
api:
deploy:
resources:
limits:
cpus: "1"
memory: 1Gcurl http://localhost:8000/healthcurl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d @test_data.jsonOpen browser: http://localhost:8501
Open browser: http://localhost:5000
docker-compose --profile training up training# Real-time stats
docker stats
# Specific container
docker stats churn-prediction-api# All logs
docker-compose logs
# Follow logs
docker-compose logs -f
# Last 100 lines
docker-compose logs --tail=100# Check health
docker ps
# Inspect health
docker inspect churn-prediction-api | grep Health -A 10On any machine with Docker:
# Pull images
docker pull username/churn-prediction-api:latest
docker pull username/churn-prediction-streamlit:latest
# Run
docker-compose up -d# Development
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up
# Production
docker-compose -f docker-compose.yml -f docker-compose.prod.yml upβ
Unit tests (70%+ coverage)
β
Integration tests
β
Data quality tests
β
Model performance tests
β
Automated test suite
Phase 4 implements a comprehensive testing framework covering unit tests, integration tests, data validation, model performance, and API testing.
- Unit Tests - Individual component testing
- Integration Tests - Pipeline and API testing
- Data Tests - Data quality and validation
- Model Tests - Performance and fairness testing
pytest.ini- Pytest configurationcoveragerc- Coverage settingstests/conftest.py- Shared fixturestests/unit/test_data_ingestion.pytests/unit/test_prediction_pipeline.pytests/integration/test_api_endpoints.pytests/data/test_data_quality.pytests/model/test_model_performance.pyscripts/run_tests.sh- Test runner (Linux/Mac)scripts/run_tests.bat- Test runner (Windows)
pip install pytest pytest-cov pytest-mock pytest-timeoutOr use existing requirements.txt:
pip install -r requirements.txt# Linux/Mac
chmod +x scripts/run_tests.sh
./scripts/run_tests.sh all
# Windows
scripts\run_tests.bat all
# Or directly with pytest
pytest -v./scripts/run_tests.sh coverage
# Then open in browser
open htmlcov/index.html # Mac
xdg-open htmlcov/index.html # Linux
start htmlcov\index.html # WindowsTest individual components in isolation.
Run:
pytest -m unit -v
./scripts/run_tests.sh unitTests:
- Data ingestion logic
- Data preprocessing
- Model training components
- Prediction pipeline
- Utility functions
Example:
pytest tests/unit/test_data_ingestion.py -vTest complete workflows and API endpoints.
Run:
pytest -m integration -v
./scripts/run_tests.sh integrationTests:
- End-to-end training pipeline
- API endpoint responses
- Service communication
- Complete prediction workflow
Example:
pytest tests/integration/test_api_endpoints.py -vValidate data quality and integrity.
Run:
pytest -m data -v
./scripts/run_tests.sh dataTests:
- Missing values
- Data types
- Valid categories
- Numerical ranges
- Data consistency
- Class distribution
Example:
pytest tests/data/test_data_quality.py -vEnsure model meets performance requirements.
Run:
pytest -m model -v
./scripts/run_tests.sh modelTests:
- Minimum accuracy (60%)
- Minimum precision (50%)
- Minimum recall (50%)
- Minimum F1 score (55%)
- Minimum ROC-AUC (70%)
- No constant predictions
- Fairness across groups
Example:
pytest tests/model/test_model_performance.py -vTests are organized using pytest markers:
# Run specific marker
pytest -m unit
pytest -m integration
pytest -m data
pytest -m model
pytest -m api
pytest -m slow
pytest -m requires_model
# Combine markers
pytest -m "unit and not slow"
pytest -m "integration or api"
# Exclude markers
pytest -m "not slow"Check coverage:
pytest --cov=src --cov=api --cov-report=term-missing- Terminal - Summary in console
- HTML - Detailed report in
htmlcov/ - XML - For CI/CD integration
# Generate report
pytest --cov=src --cov=api --cov-report=html
# Open in browser
open htmlcov/index.html# Run all tests
pytest
# Verbose output
pytest -v
# Show print statements
pytest -s
# Stop on first failure
pytest -x
# Run last failed tests
pytest --lf
# Run specific file
pytest tests/unit/test_data_ingestion.py
# Run specific test
pytest tests/unit/test_data_ingestion.py::test_function_name./scripts/run_tests.sh all # All tests
./scripts/run_tests.sh unit # Unit tests
./scripts/run_tests.sh integration # Integration tests
./scripts/run_tests.sh data # Data tests
./scripts/run_tests.sh model # Model tests
./scripts/run_tests.sh api # API tests
./scripts/run_tests.sh fast # Exclude slow tests
./scripts/run_tests.sh coverage # With coverage
./scripts/run_tests.sh ci # CI pipeline
./scripts/run_tests.sh clean # Clean artifacts# Show slowest tests
pytest --durations=10
# Set timeout
pytest --timeout=60# Install plugin
pip install pytest-xdist
# Run in parallel
pytest -n autoβ
GitHub Actions workflows
β
Automated testing
β
Docker builds & pushes
β
Security scanning
β
Code quality checks
β
Deployment automation
Phase 5 implements comprehensive CI/CD pipelines using GitHub Actions for automated testing, building, security scanning, and deployment.
- CI Pipeline (
ci.yml) - Automated testing & validation - Docker Build (
docker-build.yml) - Build & push Docker images - Code Quality (
code-quality.yml) - Linting & formatting - Security (
security.yml) - Vulnerability scanning - Deployment (
deploy.yml) - AWS EKS deployment
.pre-commit-config.yaml- Pre-commit hooks.flake8- Flake8 linting configpyproject.toml- Black, isort, mypy config
# Initialize git (if not already)
git init
git add .
git commit -m "Initial commit"
# Create GitHub repository and push
git remote add origin https://github.com/yourusername/your-repo.git
git push -u origin mainGo to: Settings β Secrets and variables β Actions
Add these secrets:
DOCKER_USERNAME- Your Docker Hub usernameDOCKER_PASSWORD- Your Docker Hub password/tokenAWS_ACCESS_KEY_ID- AWS access keyAWS_SECRET_ACCESS_KEY- AWS secret key
GitHub Actions should be enabled by default. Verify at: Settings β Actions β General
pip install pre-commit
pre-commit installgit add .
git commit -m "Setup CI/CD pipeline"
git pushGitHub Actions will automatically trigger! π
βββββββββββββββββββββββββββββββββββββββββββββββ
β Push/PR to main/develop β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β
ββββββββββββ΄βββββββββββ
β β
ββββββββββββ ββββββββββββ
β Test β β Lint β
ββββββ¬ββββββ ββββββ¬ββββββ
β β
ββββββββββββ¬ββββββββββ
β
βββββββββββββββ
β Security β
ββββββββ¬βββββββ
β
ββββββββΌβββββββ
β Docker Buildβ
ββββββββ¬βββββββ
β
ββββββββββΌβββββββββ
β Deploy (Tag) β
βββββββββββββββββββ
Triggers:
- Push to
mainordevelop - Pull requests to
mainordevelop
Jobs:
- β Run tests (Python 3.9, 3.10, 3.11)
- β Check code quality (flake8, black, isort)
- β Security scanning (Bandit, Safety)
- β Test Docker build
- β Upload coverage to Codecov
Usage:
# Automatically runs on push
git push origin main
# View results
https://github.com/youruser/yourrepo/actionsTriggers:
- Push to
main - Tags (
v*) - Manual trigger
- Pull requests (build only, no push)
Jobs:
- β Build API, Streamlit, Training images
- β Multi-arch builds (amd64, arm64)
- β Push to Docker Hub with tags
- β Scan images with Trivy
- β Verify images
- β Cleanup old images
Tags Created:
latest- Latest main branchv1.0.0- Specific versionmain-abc123- Git commit SHApr-42- Pull request number
Checks:
- β Black formatting
- β isort import sorting
- β Flake8 linting
- β Pylint analysis
- β MyPy type checking
- β Cyclomatic complexity
- β Docstring coverage
- β Dependency licenses
Auto-fix:
- Automatically fixes formatting on PRs
- Commits fixes back to branch
Scans:
- β Dependency vulnerabilities (Safety, pip-audit)
- β Code security (Bandit, Semgrep)
- β Secret detection (Gitleaks, TruffleHog)
- β SAST (CodeQL)
- β Docker image scanning (Trivy, Grype)
- β Compliance checks
Schedule:
- Runs on every push
- Weekly full scan (Sunday midnight)
Deploying a Docker image from Docker Hub to AWS ECS using ECR, Fargate, and GitHub Actions.
- Setting up AWS infrastructure (ECR, ECS cluster, task definition, service)
- Configuring AWS credentials in GitHub
- Creating a GitHub Actions workflow to automate deployment
Note: The process is clearly explanied on DESC.md
https://github.com/FraidoonOmarzai/Chunk_Prediction-MLOps-/actions
β
Local Kubernetes Development
β
AWS EKS cluster
β
Kubernetes orchestration
β
Load balancing
β
High availability
- Enable Kubernetes in Docker Desktop settings
#1. Open Docker Desktop
#2. Go to Settings / Preferences
#3. Select Kubernetes
#4. Check β
Enable Kubernetes
#5. Click Apply & Restart- Also install kubectl:
Docker Desktop installs kubectl automatically.
- Verify cluster is running
kubectl cluster-info
kubectl get nodes- Apply your Kubernetes manifests
# Create namespace
kubectl apply -f namespace.yaml
# Deploy API
kubectl apply -f api.yaml
# Deploy Streamlit frontend
kubectl apply -f streamlit.yaml- Check deployment status
# Watch pods starting up
kubectl get pods -n churn-prediction -w
# Check services
kubectl get svc -n churn-prediction
# Check deployment details
kubectl get deployments -n churn-prediction
```bash
# Port forward the services instead
kubectl port-forward -n churn-prediction svc/streamlit-service 8501:80
kubectl port-forward -n churn-prediction svc/api-service 8000:80- Access your app
Streamlit UI: http://localhost:8501
API: http://localhost:8000
API Health: http://localhost:8000/health
ποΈ Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AWS EKS Cluster β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
β β churn-prediction namespace β β
β β β β
β β ββββββββββββββββββ βββββββββββββββββββ β β
β β β API Service β β Streamlit App β β β
β β β (2 replicas) ββββββ€ (2 replicas) β β β
β β β β β β β β
β β β Port: 8000 β β Port: 8501 β β β
β β ββββββββββ¬ββββββββ ββββββββββ¬βββββββββ β β
β β β β β β
β β ββββββββββΌββββββββ ββββββββββΌβββββββββ β β
β β β LoadBalancer β β LoadBalancer β β β
β β β (AWS NLB) β β (AWS NLB) β β β
β β ββββββββββββββββββ βββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββΌβββββββββββ
β Docker Hub β
β β
β - API Image β
β - Streamlit Image β
βββββββββββββββββββββββ
π¦ Components API Service: FastAPI backend for churn prediction Streamlit App: Interactive web interface Docker Hub: Container image registry AWS EKS: Managed Kubernetes service GitHub Actions: CI/CD automation
π Repository Structure
.
βββ .github/
β βββ workflows/
β βββ eks_deploy.yml # GitHub Actions CI/CD pipeline
βββ k8s/
β βββ namespace.yaml # Kubernetes namespace
β βββ api.yaml # API deployment and service
β βββ streamlit.yaml # Streamlit deployment and service
βββ eks-cluster.yaml # EKS cluster configuration
βββ k8s/
β βββ eks_deploy.sh # Manual deployment script
βββ README.md # This file
- AWS Account
- AWS CLI configured
- kubectl installed
- eksctl installed
- Create EKS Cluster
eksctl create cluster -f eks-cluster.yaml- Deploy Application
- Option A: Using GitHub Actions (Automated)
#1. Fork this repository
#2. Add GitHub Secrets (see Configuration section)
#3. Push to main branch
#4. GitHub Actions will automatically deploy- Option B: Using Deploy Script (Manual)
chmod +x deploy.sh
./deploy.sh- Option C: Using kubectl (Manual)
aws eks update-kubeconfig --name churn-prediction-cluster --region us-east-1
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/api.yaml
kubectl apply -f k8s/streamlit.yaml




