Skip to content

FraidoonOmarzai/Churn_Prediction-MLOps-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

55 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Churn_Prediction(MLOps)

End-to-end MLOps project with Docker, Kubernetes, and AWS

πŸš€ DEVELOPMENT PHASES:

πŸ“Š What I'm Building: A production-ready, enterprise-grade MLOps platform with:

βœ… Phase 1: ML Pipeline & Training
βœ… Phase 2: API & Streamlit UI
βœ… Phase 3: Docker Containers
βœ… Phase 4: Testing Suite
βœ… Phase 5: CI/CD
βœ… Phase 6: Kubernetes Deployment (AWS)

🎯 Business Problem

  • Predict customer churn to enable proactive retention strategies, reducing customer attrition by 15% and improving customer lifetime value.

Key Questions to Answer First:

1. Business Problem & Use Case

What problem are we solving?

  • Suggestion: Let's build a Customer Churn Prediction System (simple, practical, demonstrates full MLOps pipeline)

What's the impact?

  • Business value: Reduce customer churn by 15%
  • Target: Marketing team can proactively reach at-risk customers

2. Data Questions

What data do we have?

  • Customer demographics (age, location, tenure)
  • Usage patterns (login frequency, feature usage)
  • Transaction history (revenue, plan type)

Data size & freshness?

  • For demo: Use Kaggle dataset (Telco Customer Churn)
  • In production: Daily batch updates from database

3. ML Model Requirements

What's good enough?

Target Metrics:

  • Recall β‰₯ 80% (catch most churners)
  • Precision β‰₯ 70% (avoid too many false alarms)
  • F1-Score β‰₯ 0.75
  • AUC-ROC β‰₯ 0.85

Model type?

  • Start: Logistic Regression (baseline)
  • Experiment: Random Forest, XGBoost, LightGBM

4. System Requirements

  • Latency: < 200ms for predictions (REST API)
  • Throughput: Handle 100 requests/second
  • Availability: 99.5% uptime

πŸ› οΈ Installation & Setup

Prerequisites

  • Python 3.10+
  • pip
  • Git

1. Clone the Repository

git clone https://github.com/FraidoonOmarzai/Chunk_Prediction-MLOps-.git
cd Chunk_Prediction-MLOps-

2. Create Virtual Environment

# Create virtual environment
conda create -p ./venv python=3.11 -y

# Activate virtual environment
conda activate C:\Users\44787\Desktop\Chunk_Prediction-MLOps-\venv

3. Install Dependencies

pip install -r requirements.txt

4. Create Directory Structure (Define template of the project)

touch template.py
python3 template.py

5. Define Logger and Custom Exception


alt textML Pipeline (Phase 1)

βœ… Data ingestion & validation
βœ… Feature engineering
βœ… Model training (4 algorithms)
βœ… MLflow experiment tracking
βœ… Model evaluation & selection

πŸ“₯ Download Dataset

Option 1: Automatic Download (Recommended)

python scripts/download_data.py

Option 2: Manual Download

  1. Visit: https://www.kaggle.com/datasets/blastchar/telco-customer-churn
  2. Download the dataset
  3. Save as: data/raw/churn_data.csv

Option 3: Kaggle API

# Install Kaggle
pip install kaggle

# Set up credentials (~/.kaggle/kaggle.json)
# Then run:
python scripts/download_data.py

Run Jupyter Notebook for Experiments

βœ… Typical Experiment Workflow

  1. Prototype in Jupyter Notebook
  • You explore ideas, try models, test functions, visualize results, etc.

  • This is the β€œplayground” phase.

  1. Move Stable Code to Python Scripts (.py)
  • Once your experiment code is working in the notebook, you usually move the clean, reusable parts into Python files.
  1. Components:

    • src/components/data_ingestion.py
    • src/components/data_validation.py
    • src/components/data_preprocessing.py
    • src/components/model_trainer.py
    • src/components/model_evaluation.py
  2. Pipelines:

    • src/pipeline/training_pipeline.py
  3. Utilities:

    • scripts/train.py


3. Run Experiments From Scripts

- Running .py files is better for:

    - long training jobs
    - large experiments
    - automated logging
    - reproducibility

## 🎯 Run Training Pipeline

### Execute Complete Pipeline
```bash
python scripts/train.py

What Happens:

  1. Data Ingestion: Loads and splits data (80/20)
  2. Data Validation: Checks schema and quality
  3. Data Preprocessing: Cleans and transforms features
  4. Model Training: Trains 4 models with MLflow tracking
  5. Model Evaluation: Compares models and selects best

Expected Output:

======================================================================
TRAINING PIPELINE COMPLETED SUCCESSFULLY!
======================================================================

Best Model: xgboost
Models trained: 4
Preprocessor saved at: artifacts/preprocessors/preprocessor.pkl

Check MLflow UI for detailed experiment tracking:
  Run: mlflow ui
  Open: http://localhost:5000
======================================================================

πŸ“Š View Experiments with MLflow

Start MLflow UI

mlflow ui

Access Dashboard

Open browser: http://localhost:5000

What You'll See:

  • All experiment runs
  • Parameters for each model
  • Metrics (accuracy, precision, recall, F1, ROC-AUC)
  • Model artifacts
  • Comparison charts

πŸ“ˆ Evaluation Metrics

Model Performance Targets

  • Recall: β‰₯ 80% (catch most churners)
  • Precision: β‰₯ 70% (avoid false alarms)
  • F1-Score: β‰₯ 0.75
  • ROC-AUC: β‰₯ 0.85

Metrics Calculated

  • Accuracy
  • Precision, Recall, F1-Score
  • ROC-AUC
  • Confusion Matrix
  • Specificity, Sensitivity
  • Classification Report

View Results

# Check evaluation report
cat artifacts/metrics/evaluation_report.json

# Check validation report
cat artifacts/validation_report.json

πŸ”§ Configuration

Main Configuration (config/config.yaml)

  • Data paths
  • Train/test split ratio
  • Feature lists
  • Artifact locations
  • MLflow settings

Model Configuration (config/model_config.yaml)

  • Hyperparameters for each model
  • Algorithm-specific settings
  • Training parameters

πŸ“ Logs

Log Files

Logs are saved in: logs/

Log Format

[2024-11-04 10:30:45] INFO - ChurnPrediction - Starting training pipeline
[2024-11-04 10:30:46] INFO - ChurnPrediction - Data loaded: (7043, 21)

πŸ§ͺ Generated Artifacts

After Training:

artifacts/
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ logistic_regression.pkl
β”‚   β”œβ”€β”€ random_forest.pkl
β”‚   β”œβ”€β”€ xgboost.pkl
β”‚   └── lightgbm.pkl
β”œβ”€β”€ preprocessors/
β”‚   β”œβ”€β”€ preprocessor.pkl
β”‚   └── preprocessor_label_encoder.pkl
β”œβ”€β”€ metrics/
β”‚   └── evaluation_report.json
└── validation_report.json

πŸŽ“ Model Training Details

Models Trained:

  1. Logistic Regression (Baseline)
  2. Random Forest (Ensemble)
  3. XGBoost (Gradient Boosting)
  4. LightGBM (Fast Gradient Boosting)

Training Process:

  • Stratified train/test split (80/20)
  • Standard scaling for numerical features
  • One-hot encoding for categorical features
  • Automated hyperparameter configuration
  • MLflow tracking for all experiments

alt textAPI & UI (Phase 2)

βœ… FastAPI REST API
βœ… Streamlit dashboard
βœ… Real-time predictions
βœ… Batch processing

Phase 2 adds FastAPI REST API and Streamlit Dashboard for real-time predictions and interactive visualizations.


πŸ“¦ New Components Added

1. Prediction Pipeline (src/pipeline/prediction_pipeline.py)

  • Loads trained models for inference
  • Handles single and batch predictions
  • Calculates risk levels
  • Feature importance extraction

2. FastAPI REST API (api/)

  • RESTful endpoints for predictions
  • Request/response validation with Pydantic
  • Auto-generated API documentation
  • Health checks and monitoring

3. Streamlit Dashboard (streamlit_app/)

  • Interactive web interface
  • Single customer prediction
  • Batch CSV upload
  • Visualizations and analytics

πŸš€ Quick Start

Step 1: Install New Dependencies

pip install --upgrade pip
pip install fastapi uvicorn[standard] streamlit plotly python-multipart

Or install from updated requirements.txt:

pip install -r requirements.txt

Step 2: Ensure Model is Trained

# If you haven't trained models yet
python scripts/train.py

Step 3: Start the FastAPI Server

python run_api.py

API will be available at:

Step 4: Start the Streamlit Dashboard (New Terminal)

python run_streamlit.py

Dashboard will open at: http://localhost:8501


πŸ“‘ API Endpoints

1. Health Check

GET http://localhost:8000/health

Response:

{
  "status": "healthy",
  "model_loaded": true,
  "preprocessor_loaded": true,
  "api_version": "1.0.0"
}

2. Single Prediction

POST http://localhost:8000/predict

Request Body:

{
  "customer": {
    "gender": "Female",
    "SeniorCitizen": 0,
    "Partner": "Yes",
    "Dependents": "No",
    "tenure": 12,
    "PhoneService": "Yes",
    "MultipleLines": "No",
    "InternetService": "Fiber optic",
    "OnlineSecurity": "No",
    "OnlineBackup": "Yes",
    "DeviceProtection": "No",
    "TechSupport": "No",
    "StreamingTV": "Yes",
    "StreamingMovies": "No",
    "Contract": "Month-to-month",
    "PaperlessBilling": "Yes",
    "PaymentMethod": "Electronic check",
    "MonthlyCharges": 70.35,
    "TotalCharges": 840.5
  }
}

Response:

{
  "prediction": "Yes",
  "prediction_label": 1,
  "churn_probability": 0.7245,
  "no_churn_probability": 0.2755,
  "confidence": 0.7245,
  "risk_level": "High"
}

3. Batch Prediction

POST http://localhost:8000/predict/batch

Request Body:

{
  "customers": [
    {
      /* customer 1 data */
    },
    {
      /* customer 2 data */
    }
  ]
}

Response:

{
  "predictions": [
    /* array of predictions */
  ],
  "total_customers": 2,
  "high_risk_count": 1
}

4. Model Information

GET http://localhost:8000/model/info

5. Feature Importance

GET http://localhost:8000/model/feature-importance

πŸ§ͺ Testing the API

Option 1: Interactive Docs (Recommended)

  1. Start API server: python run_api.py
  2. Open browser: http://localhost:8000/docs
  3. Try out endpoints directly in the browser

Option 2: Test Script

python test_api.py

Option 3: cURL Commands

# Health check
curl http://localhost:8000/health

# Single prediction
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d @sample_request.json

Option 4: Python Requests

import requests

response = requests.post(
    "http://localhost:8000/predict",
    json={"customer": {/* customer data */}}
)
print(response.json())

alt textContainerization (Phase 3)

βœ… Docker images (API, Streamlit, Training)
βœ… Docker Compose orchestration
βœ… Multi-stage builds
βœ… Pushed to Docker Hub

Phase 3 containerizes the entire application using Docker, enabling consistent deployments across any environment.


πŸ“¦ What's Been Created

Docker Images

  1. API Image - FastAPI service
  2. Streamlit Image - Web dashboard
  3. Training Image - Model training pipeline

Configuration Files

  1. docker/Dockerfile.api - API container definition
  2. docker/Dockerfile.streamlit - Streamlit container
  3. docker/Dockerfile.training - Training container
  4. docker-compose.yml - Multi-container orchestration
  5. .dockerignore - Exclude unnecessary files
  6. .env.example - Environment template
  7. Build & push scripts

mlops-churn-prediction/
β”‚
β”œβ”€β”€ docker/
β”‚   β”œβ”€β”€ Dockerfile.api           # NEW - API container
β”‚   β”œβ”€β”€ Dockerfile.streamlit     # NEW - Streamlit container
β”‚   └── Dockerfile.training      # NEW - Training container
β”‚
β”œβ”€β”€ docker-compose.yml           # NEW - Orchestration
β”œβ”€β”€ .dockerignore                # NEW - Exclude files
β”œβ”€β”€ .env.example                 # NEW - Environment template
β”‚
└── scripts/
    β”œβ”€β”€ build_images.sh/bat          # NEW - Build script
    β”œβ”€β”€ push_images.sh/bat           # NEW - Push to Docker Hub
    └── run_docker.sh/bat            # NEW - Run containers (.sh for Mac/Linux)

πŸš€ Quick Start

Step 1: Prerequisites

# Install Docker Desktop
# Download from: https://www.docker.com/products/docker-desktop

# Verify installation
docker --version
docker-compose --version

Step 2: Setup Environment

# Edit .env file
# Set DOCKER_USERNAME to your Docker Hub username
nano .env  # or use your favorite editor

Example .env:

DOCKER_USERNAME=yourusername
VERSION=v1.0.0

Step 3: Make Scripts Executable (Linux/Mac)

chmod +x scripts/build_images.sh
chmod +x scripts/push_images.sh
chmod +x scripts/run_docker.sh

Step 4: Build Images

Linux/Mac:

./scripts/build_images.sh

Windows:

scripts\build_images.bat

This will build all 3 Docker images (~5-10 minutes first time).

Step 5: Run with Docker Compose

# Start all services
docker-compose up -d

# Or use the helper script
./scripts/run_docker.sh start

Step 6: Access Services


πŸ—οΈ Docker Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Docker Compose Network           β”‚
β”‚                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   API    β”‚  β”‚Streamlit β”‚   β”‚MLflow β”‚  β”‚
β”‚  β”‚  :8000   β”‚  β”‚  :8501   β”‚   β”‚ :5000 β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”¬β”€β”€β”€β”˜  β”‚
β”‚       β”‚             β”‚             β”‚      β”‚
β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚            Shared Network                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                    β”‚
    [Volumes]            [Volumes]
    artifacts/            mlflow/
     logs/                 data/
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Dockerfile.training β”‚ ──build──> 🐳 app-image
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Dockerfile.streamlit β”‚ ──build──> 🐳 db-image
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Dockerfile.api  β”‚ ──build──> 🐳 api-image
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ docker-compose   β”‚ ──orchestrates──> πŸš€ All containers running together
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🚒 Pushing to Docker Hub

Step 1: Create Docker Hub Account

  1. Go to https://hub.docker.com
  2. Sign up for free account
  3. Create access token (Settings β†’ Security β†’ New Access Token)

Step 2: Login to Docker Hub

docker login
# Enter username and password/token

Step 3: Push Images

Linux/Mac:

./scripts/push_images.sh

Windows:

scripts\push_images.bat

Manual Push:

docker push username/churn-prediction-api:latest
docker push username/churn-prediction-streamlit:latest
docker push username/churn-prediction-training:latest

Step 4: Verify

Visit: https://hub.docker.com/u/yourusername

πŸ”§ Customization

Change Ports

Edit docker-compose.yml:

services:
  api:
    ports:
      - "8080:8000" # Change 8080 to your port

Add Environment Variables

services:
  api:
    environment:
      - CUSTOM_VAR=value
      - LOG_LEVEL=DEBUG

Use Different Model

Edit .env:

MODEL_PATH=artifacts/models/random_forest.pkl

Memory Limits

services:
  api:
    deploy:
      resources:
        limits:
          cpus: "1"
          memory: 1G

πŸ§ͺ Testing Dockerized Services

Test API Health

curl http://localhost:8000/health

Test Prediction

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d @test_data.json

Test Streamlit

Open browser: http://localhost:8501

Test MLflow

Open browser: http://localhost:5000

Run Training in Container

docker-compose --profile training up training

πŸ“Š Monitoring

Container Stats

# Real-time stats
docker stats

# Specific container
docker stats churn-prediction-api

Logs

# All logs
docker-compose logs

# Follow logs
docker-compose logs -f

# Last 100 lines
docker-compose logs --tail=100

Health Checks

# Check health
docker ps

# Inspect health
docker inspect churn-prediction-api | grep Health -A 10

🎯 Production Deployment

Using Built Images

On any machine with Docker:

# Pull images
docker pull username/churn-prediction-api:latest
docker pull username/churn-prediction-streamlit:latest

# Run
docker-compose up -d

Environment-Specific Configs

# Development
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up

# Production
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up

alt textTesting (Phase 4)

βœ… Unit tests (70%+ coverage)
βœ… Integration tests
βœ… Data quality tests
βœ… Model performance tests
βœ… Automated test suite

Phase 4 implements a comprehensive testing framework covering unit tests, integration tests, data validation, model performance, and API testing.


πŸ“¦ What's Been Created

Test Structure (4 Categories)

  1. Unit Tests - Individual component testing
  2. Integration Tests - Pipeline and API testing
  3. Data Tests - Data quality and validation
  4. Model Tests - Performance and fairness testing

Test Files

  1. pytest.ini - Pytest configuration
  2. coveragerc - Coverage settings
  3. tests/conftest.py - Shared fixtures
  4. tests/unit/test_data_ingestion.py
  5. tests/unit/test_prediction_pipeline.py
  6. tests/integration/test_api_endpoints.py
  7. tests/data/test_data_quality.py
  8. tests/model/test_model_performance.py
  9. scripts/run_tests.sh - Test runner (Linux/Mac)
  10. scripts/run_tests.bat - Test runner (Windows)

πŸš€ Quick Start

Step 1: Install Test Dependencies

pip install pytest pytest-cov pytest-mock pytest-timeout

Or use existing requirements.txt:

pip install -r requirements.txt

Step 2: Run All Tests

# Linux/Mac
chmod +x scripts/run_tests.sh
./scripts/run_tests.sh all

# Windows
scripts\run_tests.bat all

# Or directly with pytest
pytest -v

Step 3: View Coverage Report

./scripts/run_tests.sh coverage

# Then open in browser
open htmlcov/index.html  # Mac
xdg-open htmlcov/index.html  # Linux
start htmlcov\index.html  # Windows

πŸ“Š Test Categories

1. Unit Tests (tests/unit/)

Test individual components in isolation.

Run:

pytest -m unit -v
./scripts/run_tests.sh unit

Tests:

  • Data ingestion logic
  • Data preprocessing
  • Model training components
  • Prediction pipeline
  • Utility functions

Example:

pytest tests/unit/test_data_ingestion.py -v

2. Integration Tests (tests/integration/)

Test complete workflows and API endpoints.

Run:

pytest -m integration -v
./scripts/run_tests.sh integration

Tests:

  • End-to-end training pipeline
  • API endpoint responses
  • Service communication
  • Complete prediction workflow

Example:

pytest tests/integration/test_api_endpoints.py -v

3. Data Quality Tests (tests/data/)

Validate data quality and integrity.

Run:

pytest -m data -v
./scripts/run_tests.sh data

Tests:

  • Missing values
  • Data types
  • Valid categories
  • Numerical ranges
  • Data consistency
  • Class distribution

Example:

pytest tests/data/test_data_quality.py -v

4. Model Performance Tests (tests/model/)

Ensure model meets performance requirements.

Run:

pytest -m model -v
./scripts/run_tests.sh model

Tests:

  • Minimum accuracy (60%)
  • Minimum precision (50%)
  • Minimum recall (50%)
  • Minimum F1 score (55%)
  • Minimum ROC-AUC (70%)
  • No constant predictions
  • Fairness across groups

Example:

pytest tests/model/test_model_performance.py -v

🎯 Test Markers

Tests are organized using pytest markers:

# Run specific marker
pytest -m unit
pytest -m integration
pytest -m data
pytest -m model
pytest -m api
pytest -m slow
pytest -m requires_model

# Combine markers
pytest -m "unit and not slow"
pytest -m "integration or api"

# Exclude markers
pytest -m "not slow"

πŸ“ˆ Coverage Requirements

Minimum Coverage: 70%

Check coverage:

pytest --cov=src --cov=api --cov-report=term-missing

Coverage Reports Generated:

  1. Terminal - Summary in console
  2. HTML - Detailed report in htmlcov/
  3. XML - For CI/CD integration

View HTML Report:

# Generate report
pytest --cov=src --cov=api --cov-report=html

# Open in browser
open htmlcov/index.html

πŸ“‹ Test Runner Commands

Basic Commands

# Run all tests
pytest

# Verbose output
pytest -v

# Show print statements
pytest -s

# Stop on first failure
pytest -x

# Run last failed tests
pytest --lf

# Run specific file
pytest tests/unit/test_data_ingestion.py

# Run specific test
pytest tests/unit/test_data_ingestion.py::test_function_name

Script Commands

./scripts/run_tests.sh all        # All tests
./scripts/run_tests.sh unit       # Unit tests
./scripts/run_tests.sh integration # Integration tests
./scripts/run_tests.sh data       # Data tests
./scripts/run_tests.sh model      # Model tests
./scripts/run_tests.sh api        # API tests
./scripts/run_tests.sh fast       # Exclude slow tests
./scripts/run_tests.sh coverage   # With coverage
./scripts/run_tests.sh ci         # CI pipeline
./scripts/run_tests.sh clean      # Clean artifacts

πŸ“Š Performance Testing

Test Execution Time

# Show slowest tests
pytest --durations=10

# Set timeout
pytest --timeout=60

Parallel Execution

# Install plugin
pip install pytest-xdist

# Run in parallel
pytest -n auto



alt textCI/CD (Phase 5)

βœ… GitHub Actions workflows
βœ… Automated testing
βœ… Docker builds & pushes
βœ… Security scanning
βœ… Code quality checks
βœ… Deployment automation

Phase 5 implements comprehensive CI/CD pipelines using GitHub Actions for automated testing, building, security scanning, and deployment.


πŸ“¦ What's Been Created

GitHub Actions Workflows (6)

  1. CI Pipeline (ci.yml) - Automated testing & validation
  2. Docker Build (docker-build.yml) - Build & push Docker images
  3. Code Quality (code-quality.yml) - Linting & formatting
  4. Security (security.yml) - Vulnerability scanning
  5. Deployment (deploy.yml) - AWS EKS deployment

Configuration Files (4)

  1. .pre-commit-config.yaml - Pre-commit hooks
  2. .flake8 - Flake8 linting config
  3. pyproject.toml - Black, isort, mypy config

πŸš€ Quick Start

Step 1: Setup GitHub Repository

# Initialize git (if not already)
git init
git add .
git commit -m "Initial commit"

# Create GitHub repository and push
git remote add origin https://github.com/yourusername/your-repo.git
git push -u origin main

Step 2: Configure GitHub Secrets

Go to: Settings β†’ Secrets and variables β†’ Actions

Add these secrets:

  • DOCKER_USERNAME - Your Docker Hub username
  • DOCKER_PASSWORD - Your Docker Hub password/token
  • AWS_ACCESS_KEY_ID - AWS access key
  • AWS_SECRET_ACCESS_KEY - AWS secret key

Step 3: Enable GitHub Actions

GitHub Actions should be enabled by default. Verify at: Settings β†’ Actions β†’ General

Step 4: Install Pre-commit Hooks (Local)

pip install pre-commit
pre-commit install

Step 5: Make Your First Commit

git add .
git commit -m "Setup CI/CD pipeline"
git push

GitHub Actions will automatically trigger! πŸŽ‰


πŸ“Š CI/CD Pipeline Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           Push/PR to main/develop           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        ↓                     ↓
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚   Test   β”‚         β”‚   Lint   β”‚
  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
       β”‚                    β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  ↓
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           β”‚  Security   β”‚
           β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
           β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
           β”‚ Docker Buildβ”‚
           β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚   Deploy (Tag)  β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”§ Workflow Details

1. CI Pipeline (ci.yml)

Triggers:

  • Push to main or develop
  • Pull requests to main or develop

Jobs:

  • βœ… Run tests (Python 3.9, 3.10, 3.11)
  • βœ… Check code quality (flake8, black, isort)
  • βœ… Security scanning (Bandit, Safety)
  • βœ… Test Docker build
  • βœ… Upload coverage to Codecov

Usage:

# Automatically runs on push
git push origin main

# View results
https://github.com/youruser/yourrepo/actions

2. Docker Build & Push (docker-build.yml)

Triggers:

  • Push to main
  • Tags (v*)
  • Manual trigger
  • Pull requests (build only, no push)

Jobs:

  • βœ… Build API, Streamlit, Training images
  • βœ… Multi-arch builds (amd64, arm64)
  • βœ… Push to Docker Hub with tags
  • βœ… Scan images with Trivy
  • βœ… Verify images
  • βœ… Cleanup old images

Tags Created:

  • latest - Latest main branch
  • v1.0.0 - Specific version
  • main-abc123 - Git commit SHA
  • pr-42 - Pull request number

3. Code Quality (code-quality.yml)

Checks:

  • βœ… Black formatting
  • βœ… isort import sorting
  • βœ… Flake8 linting
  • βœ… Pylint analysis
  • βœ… MyPy type checking
  • βœ… Cyclomatic complexity
  • βœ… Docstring coverage
  • βœ… Dependency licenses

Auto-fix:

  • Automatically fixes formatting on PRs
  • Commits fixes back to branch

4. Security Scanning (security.yml)

Scans:

  • βœ… Dependency vulnerabilities (Safety, pip-audit)
  • βœ… Code security (Bandit, Semgrep)
  • βœ… Secret detection (Gitleaks, TruffleHog)
  • βœ… SAST (CodeQL)
  • βœ… Docker image scanning (Trivy, Grype)
  • βœ… Compliance checks

Schedule:

  • Runs on every push
  • Weekly full scan (Sunday midnight)

5. Deployment (aws_ecs_deploy.yml)

Deploying a Docker image from Docker Hub to AWS ECS using ECR, Fargate, and GitHub Actions.

The process involves:

  • Setting up AWS infrastructure (ECR, ECS cluster, task definition, service)
  • Configuring AWS credentials in GitHub
  • Creating a GitHub Actions workflow to automate deployment

Note: The process is clearly explanied on DESC.md


πŸ“ˆ Monitoring CI/CD

View Workflow Runs

https://github.com/FraidoonOmarzai/Chunk_Prediction-MLOps-/actions

Status Badges

CI

Docker

Security


alt textCloud Deployment (Phase 6)

βœ… Local Kubernetes Development
βœ… AWS EKS cluster
βœ… Kubernetes orchestration
βœ… Load balancing
βœ… High availability

Part 1: Running Locally

Prerequisites

  1. Enable Kubernetes in Docker Desktop settings
#1. Open Docker Desktop
#2. Go to Settings / Preferences
#3. Select Kubernetes
#4. Check βœ… Enable Kubernetes
#5. Click Apply & Restart
  • Also install kubectl: Docker Desktop installs kubectl automatically.
  1. Verify cluster is running
kubectl cluster-info
kubectl get nodes
  1. Apply your Kubernetes manifests
# Create namespace
kubectl apply -f namespace.yaml

# Deploy API
kubectl apply -f api.yaml

# Deploy Streamlit frontend
kubectl apply -f streamlit.yaml
  1. Check deployment status
# Watch pods starting up
kubectl get pods -n churn-prediction -w

# Check services
kubectl get svc -n churn-prediction

# Check deployment details
kubectl get deployments -n churn-prediction

```bash
# Port forward the services instead
kubectl port-forward -n churn-prediction svc/streamlit-service 8501:80
kubectl port-forward -n churn-prediction svc/api-service 8000:80
  1. Access your app
Streamlit UI: http://localhost:8501
API: http://localhost:8000
API Health: http://localhost:8000/health

Part 2: AWS(EKS) Deployment with CI/CD

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    AWS EKS Cluster                  β”‚
β”‚                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚         churn-prediction namespace           β”‚   β”‚
β”‚  β”‚                                              β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚   β”‚
β”‚  β”‚  β”‚  API Service   β”‚    β”‚ Streamlit App   β”‚   β”‚   β”‚
β”‚  β”‚  β”‚  (2 replicas)  │◄────  (2 replicas)   β”‚   β”‚   β”‚
β”‚  β”‚  β”‚                β”‚    β”‚                 β”‚   β”‚   β”‚
β”‚  β”‚  β”‚  Port: 8000    β”‚    β”‚   Port: 8501    β”‚   β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚   β”‚
β”‚  β”‚           β”‚                      β”‚           β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚   β”‚
β”‚  β”‚  β”‚ LoadBalancer   β”‚    β”‚  LoadBalancer   β”‚   β”‚   β”‚
β”‚  β”‚  β”‚  (AWS NLB)     β”‚    β”‚   (AWS NLB)     β”‚   β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚   Docker Hub        β”‚
            β”‚                     β”‚
            β”‚  - API Image        β”‚
            β”‚  - Streamlit Image  β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“¦ Components API Service: FastAPI backend for churn prediction Streamlit App: Interactive web interface Docker Hub: Container image registry AWS EKS: Managed Kubernetes service GitHub Actions: CI/CD automation

πŸ“ Repository Structure

.
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── eks_deploy.yml      # GitHub Actions CI/CD pipeline
β”œβ”€β”€ k8s/
β”‚   β”œβ”€β”€ namespace.yaml          # Kubernetes namespace
β”‚   β”œβ”€β”€ api.yaml                # API deployment and service
β”‚   └── streamlit.yaml          # Streamlit deployment and service
β”œβ”€β”€ eks-cluster.yaml            # EKS cluster configuration
β”œβ”€β”€ k8s/
β”‚   β”œβ”€β”€ eks_deploy.sh           # Manual deployment script
└── README.md                   # This file

AWS EKS Deployment

Prerequisites
  • AWS Account
  • AWS CLI configured
  • kubectl installed
  • eksctl installed
  1. Create EKS Cluster
eksctl create cluster -f eks-cluster.yaml
  1. Deploy Application
  • Option A: Using GitHub Actions (Automated)
#1. Fork this repository
#2. Add GitHub Secrets (see Configuration section)
#3. Push to main branch
#4. GitHub Actions will automatically deploy
  • Option B: Using Deploy Script (Manual)
chmod +x deploy.sh
./deploy.sh
  • Option C: Using kubectl (Manual)
aws eks update-kubeconfig --name churn-prediction-cluster --region us-east-1
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/api.yaml
kubectl apply -f k8s/streamlit.yaml

About

End-to-end MLOps project with Docker, Kubernetes, and AWS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published