Churn_Prediction(MLOps)

End-to-end MLOps project with Docker, Kubernetes, and AWS

🚀 DEVELOPMENT PHASES:

📊 What I'm Building: A production-ready, enterprise-grade MLOps platform with:

✅ Phase 1: ML Pipeline & Training
✅ Phase 2: API & Streamlit UI
✅ Phase 3: Docker Containers
✅ Phase 4: Testing Suite
✅ Phase 5: CI/CD
✅ Phase 6: Kubernetes Deployment (AWS)

🎯 Business Problem

Predict customer churn to enable proactive retention strategies, reducing customer attrition by 15% and improving customer lifetime value.

Key Questions to Answer First:

1. Business Problem & Use Case

What problem are we solving?

Suggestion: Let's build a Customer Churn Prediction System (simple, practical, demonstrates full MLOps pipeline)

What's the impact?

Business value: Reduce customer churn by 15%
Target: Marketing team can proactively reach at-risk customers

2. Data Questions

What data do we have?

Customer demographics (age, location, tenure)
Usage patterns (login frequency, feature usage)
Transaction history (revenue, plan type)

Data size & freshness?

For demo: Use Kaggle dataset (Telco Customer Churn)
In production: Daily batch updates from database

3. ML Model Requirements

What's good enough?

Target Metrics:

Recall ≥ 80% (catch most churners)
Precision ≥ 70% (avoid too many false alarms)
F1-Score ≥ 0.75
AUC-ROC ≥ 0.85

Model type?

Start: Logistic Regression (baseline)
Experiment: Random Forest, XGBoost, LightGBM

4. System Requirements

Latency: < 200ms for predictions (REST API)
Throughput: Handle 100 requests/second
Availability: 99.5% uptime

🛠️ Installation & Setup

Prerequisites

Python 3.10+
pip
Git

1. Clone the Repository

git clone https://github.com/FraidoonOmarzai/Chunk_Prediction-MLOps-.git
cd Chunk_Prediction-MLOps-

2. Create Virtual Environment

# Create virtual environment
conda create -p ./venv python=3.11 -y

# Activate virtual environment
conda activate C:\Users\44787\Desktop\Chunk_Prediction-MLOps-\venv

3. Install Dependencies

pip install -r requirements.txt

4. Create Directory Structure (Define template of the project)

touch template.py
python3 template.py

5. Define Logger and Custom Exception

ML Pipeline (Phase 1)

✅ Data ingestion & validation
✅ Feature engineering
✅ Model training (4 algorithms)
✅ MLflow experiment tracking
✅ Model evaluation & selection

📥 Download Dataset

Option 1: Automatic Download (Recommended)

python scripts/download_data.py

Option 2: Manual Download

Visit: https://www.kaggle.com/datasets/blastchar/telco-customer-churn
Download the dataset
Save as: data/raw/churn_data.csv

Option 3: Kaggle API

# Install Kaggle
pip install kaggle

# Set up credentials (~/.kaggle/kaggle.json)
# Then run:
python scripts/download_data.py

Run Jupyter Notebook for Experiments

✅ Typical Experiment Workflow

Prototype in Jupyter Notebook

You explore ideas, try models, test functions, visualize results, etc.
This is the “playground” phase.

Move Stable Code to Python Scripts (.py)

Once your experiment code is working in the notebook, you usually move the clean, reusable parts into Python files.

Components:
- src/components/data_ingestion.py
- src/components/data_validation.py
- src/components/data_preprocessing.py
- src/components/model_trainer.py
- src/components/model_evaluation.py
Pipelines:
- src/pipeline/training_pipeline.py
Utilities:
- scripts/train.py



3. Run Experiments From Scripts

- Running .py files is better for:

    - long training jobs
    - large experiments
    - automated logging
    - reproducibility

## 🎯 Run Training Pipeline

### Execute Complete Pipeline
```bash
python scripts/train.py

What Happens:

Data Ingestion: Loads and splits data (80/20)
Data Validation: Checks schema and quality
Data Preprocessing: Cleans and transforms features
Model Training: Trains 4 models with MLflow tracking
Model Evaluation: Compares models and selects best

Expected Output:

======================================================================
TRAINING PIPELINE COMPLETED SUCCESSFULLY!
======================================================================

Best Model: xgboost
Models trained: 4
Preprocessor saved at: artifacts/preprocessors/preprocessor.pkl

Check MLflow UI for detailed experiment tracking:
  Run: mlflow ui
  Open: http://localhost:5000
======================================================================

📊 View Experiments with MLflow

Start MLflow UI

mlflow ui

Access Dashboard

Open browser: http://localhost:5000

What You'll See:

All experiment runs
Parameters for each model
Metrics (accuracy, precision, recall, F1, ROC-AUC)
Model artifacts
Comparison charts

📈 Evaluation Metrics

Model Performance Targets

Recall: ≥ 80% (catch most churners)
Precision: ≥ 70% (avoid false alarms)
F1-Score: ≥ 0.75
ROC-AUC: ≥ 0.85

Metrics Calculated

Accuracy
Precision, Recall, F1-Score
ROC-AUC
Confusion Matrix
Specificity, Sensitivity
Classification Report

View Results

# Check evaluation report
cat artifacts/metrics/evaluation_report.json

# Check validation report
cat artifacts/validation_report.json

🔧 Configuration

Main Configuration (`config/config.yaml`)

Data paths
Train/test split ratio
Feature lists
Artifact locations
MLflow settings

Model Configuration (`config/model_config.yaml`)

Hyperparameters for each model
Algorithm-specific settings
Training parameters

📝 Logs

Log Files

Logs are saved in: logs/

Log Format

[2024-11-04 10:30:45] INFO - ChurnPrediction - Starting training pipeline
[2024-11-04 10:30:46] INFO - ChurnPrediction - Data loaded: (7043, 21)

🧪 Generated Artifacts

After Training:

artifacts/
├── models/
│   ├── logistic_regression.pkl
│   ├── random_forest.pkl
│   ├── xgboost.pkl
│   └── lightgbm.pkl
├── preprocessors/
│   ├── preprocessor.pkl
│   └── preprocessor_label_encoder.pkl
├── metrics/
│   └── evaluation_report.json
└── validation_report.json

🎓 Model Training Details

Models Trained:

Logistic Regression (Baseline)
Random Forest (Ensemble)
XGBoost (Gradient Boosting)
LightGBM (Fast Gradient Boosting)

Training Process:

Stratified train/test split (80/20)
Standard scaling for numerical features
One-hot encoding for categorical features
Automated hyperparameter configuration
MLflow tracking for all experiments

API & UI (Phase 2)

✅ FastAPI REST API
✅ Streamlit dashboard
✅ Real-time predictions
✅ Batch processing

Phase 2 adds FastAPI REST API and Streamlit Dashboard for real-time predictions and interactive visualizations.

📦 New Components Added

1. Prediction Pipeline (`src/pipeline/prediction_pipeline.py`)

Loads trained models for inference
Handles single and batch predictions
Calculates risk levels
Feature importance extraction

2. FastAPI REST API (`api/`)

RESTful endpoints for predictions
Request/response validation with Pydantic
Auto-generated API documentation
Health checks and monitoring

3. Streamlit Dashboard (`streamlit_app/`)

Interactive web interface
Single customer prediction
Batch CSV upload
Visualizations and analytics

🚀 Quick Start

Step 1: Install New Dependencies

pip install --upgrade pip
pip install fastapi uvicorn[standard] streamlit plotly python-multipart

Or install from updated requirements.txt:

pip install -r requirements.txt

Step 2: Ensure Model is Trained

# If you haven't trained models yet
python scripts/train.py

Step 3: Start the FastAPI Server

python run_api.py

API will be available at:

Main API: http://localhost:8000
Interactive Docs: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Step 4: Start the Streamlit Dashboard (New Terminal)

python run_streamlit.py

Dashboard will open at: http://localhost:8501

📡 API Endpoints

1. Health Check

GET http://localhost:8000/health

Response:

{
  "status": "healthy",
  "model_loaded": true,
  "preprocessor_loaded": true,
  "api_version": "1.0.0"
}

2. Single Prediction

POST http://localhost:8000/predict

Request Body:

{
  "customer": {
    "gender": "Female",
    "SeniorCitizen": 0,
    "Partner": "Yes",
    "Dependents": "No",
    "tenure": 12,
    "PhoneService": "Yes",
    "MultipleLines": "No",
    "InternetService": "Fiber optic",
    "OnlineSecurity": "No",
    "OnlineBackup": "Yes",
    "DeviceProtection": "No",
    "TechSupport": "No",
    "StreamingTV": "Yes",
    "StreamingMovies": "No",
    "Contract": "Month-to-month",
    "PaperlessBilling": "Yes",
    "PaymentMethod": "Electronic check",
    "MonthlyCharges": 70.35,
    "TotalCharges": 840.5
  }
}

Response:

{
  "prediction": "Yes",
  "prediction_label": 1,
  "churn_probability": 0.7245,
  "no_churn_probability": 0.2755,
  "confidence": 0.7245,
  "risk_level": "High"
}

3. Batch Prediction

POST http://localhost:8000/predict/batch

Request Body:

{
  "customers": [
    {
      /* customer 1 data */
    },
    {
      /* customer 2 data */
    }
  ]
}

Response:

{
  "predictions": [
    /* array of predictions */
  ],
  "total_customers": 2,
  "high_risk_count": 1
}

4. Model Information

GET http://localhost:8000/model/info

5. Feature Importance

GET http://localhost:8000/model/feature-importance

🧪 Testing the API

Option 1: Interactive Docs (Recommended)

Start API server: python run_api.py
Open browser: http://localhost:8000/docs
Try out endpoints directly in the browser

Option 2: Test Script

python test_api.py

Option 3: cURL Commands

# Health check
curl http://localhost:8000/health

# Single prediction
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d @sample_request.json

Option 4: Python Requests

import requests

response = requests.post(
    "http://localhost:8000/predict",
    json={"customer": {/* customer data */}}
)
print(response.json())

Containerization (Phase 3)

✅ Docker images (API, Streamlit, Training)
✅ Docker Compose orchestration
✅ Multi-stage builds
✅ Pushed to Docker Hub

Phase 3 containerizes the entire application using Docker, enabling consistent deployments across any environment.

📦 What's Been Created

Docker Images

API Image - FastAPI service
Streamlit Image - Web dashboard
Training Image - Model training pipeline

Configuration Files

docker/Dockerfile.api - API container definition
docker/Dockerfile.streamlit - Streamlit container
docker/Dockerfile.training - Training container
docker-compose.yml - Multi-container orchestration
.dockerignore - Exclude unnecessary files
.env.example - Environment template
Build & push scripts

mlops-churn-prediction/
│
├── docker/
│   ├── Dockerfile.api           # NEW - API container
│   ├── Dockerfile.streamlit     # NEW - Streamlit container
│   └── Dockerfile.training      # NEW - Training container
│
├── docker-compose.yml           # NEW - Orchestration
├── .dockerignore                # NEW - Exclude files
├── .env.example                 # NEW - Environment template
│
└── scripts/
    ├── build_images.sh/bat          # NEW - Build script
    ├── push_images.sh/bat           # NEW - Push to Docker Hub
    └── run_docker.sh/bat            # NEW - Run containers (.sh for Mac/Linux)

🚀 Quick Start

Step 1: Prerequisites

# Install Docker Desktop
# Download from: https://www.docker.com/products/docker-desktop

# Verify installation
docker --version
docker-compose --version

Step 2: Setup Environment

# Edit .env file
# Set DOCKER_USERNAME to your Docker Hub username
nano .env  # or use your favorite editor

Example .env:

DOCKER_USERNAME=yourusername
VERSION=v1.0.0

Step 3: Make Scripts Executable (Linux/Mac)

chmod +x scripts/build_images.sh
chmod +x scripts/push_images.sh
chmod +x scripts/run_docker.sh

Step 4: Build Images

Linux/Mac:

./scripts/build_images.sh

Windows:

scripts\build_images.bat

This will build all 3 Docker images (~5-10 minutes first time).

Step 5: Run with Docker Compose

# Start all services
docker-compose up -d

# Or use the helper script
./scripts/run_docker.sh start

Step 6: Access Services

API: http://localhost:8000
API Docs: http://localhost:8000/docs
Streamlit: http://localhost:8501
MLflow: http://localhost:5000

🏗️ Docker Architecture

┌──────────────────────────────────────────┐
│         Docker Compose Network           │
│                                          │
│  ┌──────────┐  ┌──────────┐   ┌───────┐  │
│  │   API    │  │Streamlit │   │MLflow │  │
│  │  :8000   │  │  :8501   │   │ :5000 │  │
│  └────┬─────┘  └────┬─────┘   └───┬───┘  │
│       │             │             │      │
│       └─────────────┴─────────────┘      │
│            Shared Network                │
└──────────────────────────────────────────┘
         │                    │
    [Volumes]            [Volumes]
    artifacts/            mlflow/
     logs/                 data/

┌──────────────────────┐
│  Dockerfile.training │ ──build──> 🐳 app-image
└──────────────────────┘

┌───────────────────────┐
│  Dockerfile.streamlit │ ──build──> 🐳 db-image
└───────────────────────┘

┌──────────────────┐
│  Dockerfile.api  │ ──build──> 🐳 api-image
└──────────────────┘
            │
            ↓
    ┌──────────────────┐
    │ docker-compose   │ ──orchestrates──> 🚀 All containers running together
    └──────────────────┘

🚢 Pushing to Docker Hub

Step 1: Create Docker Hub Account

Go to https://hub.docker.com
Sign up for free account
Create access token (Settings → Security → New Access Token)

Step 2: Login to Docker Hub

docker login
# Enter username and password/token

Step 3: Push Images

Linux/Mac:

./scripts/push_images.sh

Windows:

scripts\push_images.bat

Manual Push:

docker push username/churn-prediction-api:latest
docker push username/churn-prediction-streamlit:latest
docker push username/churn-prediction-training:latest

Step 4: Verify

Visit: https://hub.docker.com/u/yourusername

🔧 Customization

Change Ports

Edit docker-compose.yml:

services:
  api:
    ports:
      - "8080:8000" # Change 8080 to your port

Add Environment Variables

services:
  api:
    environment:
      - CUSTOM_VAR=value
      - LOG_LEVEL=DEBUG

Use Different Model

Edit .env:

MODEL_PATH=artifacts/models/random_forest.pkl

Memory Limits

services:
  api:
    deploy:
      resources:
        limits:
          cpus: "1"
          memory: 1G

🧪 Testing Dockerized Services

Test API Health

curl http://localhost:8000/health

Test Prediction

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d @test_data.json

Test Streamlit

Open browser: http://localhost:8501

Test MLflow

Open browser: http://localhost:5000

Run Training in Container

docker-compose --profile training up training

📊 Monitoring

Container Stats

# Real-time stats
docker stats

# Specific container
docker stats churn-prediction-api

Logs

# All logs
docker-compose logs

# Follow logs
docker-compose logs -f

# Last 100 lines
docker-compose logs --tail=100

Health Checks

# Check health
docker ps

# Inspect health
docker inspect churn-prediction-api | grep Health -A 10

🎯 Production Deployment

Using Built Images

On any machine with Docker:

# Pull images
docker pull username/churn-prediction-api:latest
docker pull username/churn-prediction-streamlit:latest

# Run
docker-compose up -d

Environment-Specific Configs

# Development
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up

# Production
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up

Testing (Phase 4)

✅ Unit tests (70%+ coverage)
✅ Integration tests
✅ Data quality tests
✅ Model performance tests
✅ Automated test suite

Phase 4 implements a comprehensive testing framework covering unit tests, integration tests, data validation, model performance, and API testing.

📦 What's Been Created

Test Structure (4 Categories)

Unit Tests - Individual component testing
Integration Tests - Pipeline and API testing
Data Tests - Data quality and validation
Model Tests - Performance and fairness testing

Test Files

pytest.ini - Pytest configuration
coveragerc - Coverage settings
tests/conftest.py - Shared fixtures
tests/unit/test_data_ingestion.py
tests/unit/test_prediction_pipeline.py
tests/integration/test_api_endpoints.py
tests/data/test_data_quality.py
tests/model/test_model_performance.py
scripts/run_tests.sh - Test runner (Linux/Mac)
scripts/run_tests.bat - Test runner (Windows)

🚀 Quick Start

Step 1: Install Test Dependencies

pip install pytest pytest-cov pytest-mock pytest-timeout

Or use existing requirements.txt:

pip install -r requirements.txt

Step 2: Run All Tests

# Linux/Mac
chmod +x scripts/run_tests.sh
./scripts/run_tests.sh all

# Windows
scripts\run_tests.bat all

# Or directly with pytest
pytest -v

Step 3: View Coverage Report

./scripts/run_tests.sh coverage

# Then open in browser
open htmlcov/index.html  # Mac
xdg-open htmlcov/index.html  # Linux
start htmlcov\index.html  # Windows

📊 Test Categories

1. Unit Tests (`tests/unit/`)

Test individual components in isolation.

Run:

pytest -m unit -v
./scripts/run_tests.sh unit

Tests:

Data ingestion logic
Data preprocessing
Model training components
Prediction pipeline
Utility functions

Example:

pytest tests/unit/test_data_ingestion.py -v

2. Integration Tests (`tests/integration/`)

Test complete workflows and API endpoints.

Run:

pytest -m integration -v
./scripts/run_tests.sh integration

Tests:

End-to-end training pipeline
API endpoint responses
Service communication
Complete prediction workflow

Example:

pytest tests/integration/test_api_endpoints.py -v

3. Data Quality Tests (`tests/data/`)

Validate data quality and integrity.

Run:

pytest -m data -v
./scripts/run_tests.sh data

Tests:

Missing values
Data types
Valid categories
Numerical ranges
Data consistency
Class distribution

Example:

pytest tests/data/test_data_quality.py -v

4. Model Performance Tests (`tests/model/`)

Ensure model meets performance requirements.

Run:

pytest -m model -v
./scripts/run_tests.sh model

Tests:

Minimum accuracy (60%)
Minimum precision (50%)
Minimum recall (50%)
Minimum F1 score (55%)
Minimum ROC-AUC (70%)
No constant predictions
Fairness across groups

Example:

pytest tests/model/test_model_performance.py -v

🎯 Test Markers

Tests are organized using pytest markers:

# Run specific marker
pytest -m unit
pytest -m integration
pytest -m data
pytest -m model
pytest -m api
pytest -m slow
pytest -m requires_model

# Combine markers
pytest -m "unit and not slow"
pytest -m "integration or api"

# Exclude markers
pytest -m "not slow"

📈 Coverage Requirements

Minimum Coverage: 70%

Check coverage:

pytest --cov=src --cov=api --cov-report=term-missing

Coverage Reports Generated:

Terminal - Summary in console
HTML - Detailed report in htmlcov/
XML - For CI/CD integration

View HTML Report:

# Generate report
pytest --cov=src --cov=api --cov-report=html

# Open in browser
open htmlcov/index.html

📋 Test Runner Commands

Basic Commands

# Run all tests
pytest

# Verbose output
pytest -v

# Show print statements
pytest -s

# Stop on first failure
pytest -x

# Run last failed tests
pytest --lf

# Run specific file
pytest tests/unit/test_data_ingestion.py

# Run specific test
pytest tests/unit/test_data_ingestion.py::test_function_name

Script Commands

./scripts/run_tests.sh all        # All tests
./scripts/run_tests.sh unit       # Unit tests
./scripts/run_tests.sh integration # Integration tests
./scripts/run_tests.sh data       # Data tests
./scripts/run_tests.sh model      # Model tests
./scripts/run_tests.sh api        # API tests
./scripts/run_tests.sh fast       # Exclude slow tests
./scripts/run_tests.sh coverage   # With coverage
./scripts/run_tests.sh ci         # CI pipeline
./scripts/run_tests.sh clean      # Clean artifacts

📊 Performance Testing

Test Execution Time

# Show slowest tests
pytest --durations=10

# Set timeout
pytest --timeout=60

Parallel Execution

# Install plugin
pip install pytest-xdist

# Run in parallel
pytest -n auto

CI/CD (Phase 5)

✅ GitHub Actions workflows
✅ Automated testing
✅ Docker builds & pushes
✅ Security scanning
✅ Code quality checks
✅ Deployment automation

Phase 5 implements comprehensive CI/CD pipelines using GitHub Actions for automated testing, building, security scanning, and deployment.

📦 What's Been Created

GitHub Actions Workflows (6)

CI Pipeline (ci.yml) - Automated testing & validation
Docker Build (docker-build.yml) - Build & push Docker images
Code Quality (code-quality.yml) - Linting & formatting
Security (security.yml) - Vulnerability scanning
Deployment (deploy.yml) - AWS EKS deployment

Configuration Files (4)

.pre-commit-config.yaml - Pre-commit hooks
.flake8 - Flake8 linting config
pyproject.toml - Black, isort, mypy config

🚀 Quick Start

Step 1: Setup GitHub Repository

# Initialize git (if not already)
git init
git add .
git commit -m "Initial commit"

# Create GitHub repository and push
git remote add origin https://github.com/yourusername/your-repo.git
git push -u origin main

Step 2: Configure GitHub Secrets

Go to: Settings → Secrets and variables → Actions

Add these secrets:

DOCKER_USERNAME - Your Docker Hub username
DOCKER_PASSWORD - Your Docker Hub password/token
AWS_ACCESS_KEY_ID - AWS access key
AWS_SECRET_ACCESS_KEY - AWS secret key

Step 3: Enable GitHub Actions

GitHub Actions should be enabled by default. Verify at: Settings → Actions → General

Step 4: Install Pre-commit Hooks (Local)

pip install pre-commit
pre-commit install

Step 5: Make Your First Commit

git add .
git commit -m "Setup CI/CD pipeline"
git push

GitHub Actions will automatically trigger! 🎉

📊 CI/CD Pipeline Overview

┌─────────────────────────────────────────────┐
│           Push/PR to main/develop           │
└──────────────────┬──────────────────────────┘
                   │
        ┌──────────┴──────────┐
        ↓                     ↓
  ┌──────────┐         ┌──────────┐
  │   Test   │         │   Lint   │
  └────┬─────┘         └────┬─────┘
       │                    │
       └──────────┬─────────┘
                  ↓
           ┌─────────────┐
           │  Security   │
           └──────┬──────┘
                  │
           ┌──────▼──────┐
           │ Docker Build│
           └──────┬──────┘
                  │
         ┌────────▼────────┐
         │   Deploy (Tag)  │
         └─────────────────┘

🔧 Workflow Details

1. CI Pipeline (`ci.yml`)

Triggers:

Push to main or develop
Pull requests to main or develop

Jobs:

✅ Run tests (Python 3.9, 3.10, 3.11)
✅ Check code quality (flake8, black, isort)
✅ Security scanning (Bandit, Safety)
✅ Test Docker build
✅ Upload coverage to Codecov

Usage:

# Automatically runs on push
git push origin main

# View results
https://github.com/youruser/yourrepo/actions

2. Docker Build & Push (`docker-build.yml`)

Triggers:

Push to main
Tags (v*)
Manual trigger
Pull requests (build only, no push)

Jobs:

✅ Build API, Streamlit, Training images
✅ Multi-arch builds (amd64, arm64)
✅ Push to Docker Hub with tags
✅ Scan images with Trivy
✅ Verify images
✅ Cleanup old images

Tags Created:

latest - Latest main branch
v1.0.0 - Specific version
main-abc123 - Git commit SHA
pr-42 - Pull request number

3. Code Quality (`code-quality.yml`)

Checks:

✅ Black formatting
✅ isort import sorting
✅ Flake8 linting
✅ Pylint analysis
✅ MyPy type checking
✅ Cyclomatic complexity
✅ Docstring coverage
✅ Dependency licenses

Auto-fix:

Automatically fixes formatting on PRs
Commits fixes back to branch

4. Security Scanning (`security.yml`)

Scans:

✅ Dependency vulnerabilities (Safety, pip-audit)
✅ Code security (Bandit, Semgrep)
✅ Secret detection (Gitleaks, TruffleHog)
✅ SAST (CodeQL)
✅ Docker image scanning (Trivy, Grype)
✅ Compliance checks

Schedule:

Runs on every push
Weekly full scan (Sunday midnight)

5. Deployment (`aws_ecs_deploy.yml`)

Deploying a Docker image from Docker Hub to AWS ECS using ECR, Fargate, and GitHub Actions.

The process involves:

Setting up AWS infrastructure (ECR, ECS cluster, task definition, service)
Configuring AWS credentials in GitHub
Creating a GitHub Actions workflow to automate deployment

Note: The process is clearly explanied on DESC.md

📈 Monitoring CI/CD

View Workflow Runs

https://github.com/FraidoonOmarzai/Chunk_Prediction-MLOps-/actions

Status Badges

Cloud Deployment (Phase 6)

✅ Local Kubernetes Development
✅ AWS EKS cluster
✅ Kubernetes orchestration
✅ Load balancing
✅ High availability

Part 1: Running Locally

Prerequisites

Enable Kubernetes in Docker Desktop settings

#1. Open Docker Desktop
#2. Go to Settings / Preferences
#3. Select Kubernetes
#4. Check ✅ Enable Kubernetes
#5. Click Apply & Restart

Also install kubectl: Docker Desktop installs kubectl automatically.

Verify cluster is running

kubectl cluster-info
kubectl get nodes

Apply your Kubernetes manifests

# Create namespace
kubectl apply -f namespace.yaml

# Deploy API
kubectl apply -f api.yaml

# Deploy Streamlit frontend
kubectl apply -f streamlit.yaml

Check deployment status

# Watch pods starting up
kubectl get pods -n churn-prediction -w

# Check services
kubectl get svc -n churn-prediction

# Check deployment details
kubectl get deployments -n churn-prediction

```bash
# Port forward the services instead
kubectl port-forward -n churn-prediction svc/streamlit-service 8501:80
kubectl port-forward -n churn-prediction svc/api-service 8000:80

Access your app

Streamlit UI: http://localhost:8501
API: http://localhost:8000
API Health: http://localhost:8000/health

Part 2: AWS(EKS) Deployment with CI/CD

🏗️ Architecture

┌─────────────────────────────────────────────────────┐
│                    AWS EKS Cluster                  │
│                                                     │
│  ┌──────────────────────────────────────────────┐   │
│  │         churn-prediction namespace           │   │
│  │                                              │   │
│  │  ┌────────────────┐    ┌─────────────────┐   │   │
│  │  │  API Service   │    │ Streamlit App   │   │   │
│  │  │  (2 replicas)  │◄───┤  (2 replicas)   │   │   │
│  │  │                │    │                 │   │   │
│  │  │  Port: 8000    │    │   Port: 8501    │   │   │
│  │  └────────┬───────┘    └────────┬────────┘   │   │
│  │           │                      │           │   │
│  │  ┌────────▼───────┐    ┌────────▼────────┐   │   │
│  │  │ LoadBalancer   │    │  LoadBalancer   │   │   │
│  │  │  (AWS NLB)     │    │   (AWS NLB)     │   │   │
│  │  └────────────────┘    └─────────────────┘   │   │
│  └──────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘
                       │
            ┌──────────▼──────────┐
            │   Docker Hub        │
            │                     │
            │  - API Image        │
            │  - Streamlit Image  │
            └─────────────────────┘

📦 Components API Service: FastAPI backend for churn prediction Streamlit App: Interactive web interface Docker Hub: Container image registry AWS EKS: Managed Kubernetes service GitHub Actions: CI/CD automation

📁 Repository Structure

.
├── .github/
│   └── workflows/
│       └── eks_deploy.yml      # GitHub Actions CI/CD pipeline
├── k8s/
│   ├── namespace.yaml          # Kubernetes namespace
│   ├── api.yaml                # API deployment and service
│   └── streamlit.yaml          # Streamlit deployment and service
├── eks-cluster.yaml            # EKS cluster configuration
├── k8s/
│   ├── eks_deploy.sh           # Manual deployment script
└── README.md                   # This file

AWS EKS Deployment

Prerequisites

AWS Account
AWS CLI configured
kubectl installed
eksctl installed

Create EKS Cluster

eksctl create cluster -f eks-cluster.yaml

Deploy Application

Option A: Using GitHub Actions (Automated)

#1. Fork this repository
#2. Add GitHub Secrets (see Configuration section)
#3. Push to main branch
#4. GitHub Actions will automatically deploy

Option B: Using Deploy Script (Manual)

chmod +x deploy.sh
./deploy.sh

Option C: Using kubectl (Manual)

aws eks update-kubeconfig --name churn-prediction-cluster --region us-east-1
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/api.yaml
kubectl apply -f k8s/streamlit.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
.vscode		.vscode
api		api
artifacts		artifacts
configs		configs
data		data
docker		docker
images		images
k8s		k8s
logs		logs
mlruns		mlruns
notebook		notebook
scripts		scripts
src		src
streamlit_app		streamlit_app
tests		tests
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
DESC.md		DESC.md
LICENSE		LICENSE
README.md		README.md
Setup Guide-Docker Hub to EKS.md		Setup Guide-Docker Hub to EKS.md
app.py		app.py
docker-compose.yml		docker-compose.yml
eks-cluster.yaml		eks-cluster.yaml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run_api.py		run_api.py
run_streamlit.py		run_streamlit.py
setup.py		setup.py
template.py		template.py

License

FraidoonOmarzai/Churn_Prediction-MLOps-

Folders and files

Latest commit

History

Repository files navigation

Churn_Prediction(MLOps)

🚀 DEVELOPMENT PHASES:

🎯 Business Problem

Key Questions to Answer First:

1. Business Problem & Use Case

2. Data Questions

3. ML Model Requirements

4. System Requirements

🛠️ Installation & Setup

Prerequisites

1. Clone the Repository

2. Create Virtual Environment

3. Install Dependencies

4. Create Directory Structure (Define template of the project)

5. Define Logger and Custom Exception

ML Pipeline (Phase 1)

📥 Download Dataset

Option 1: Automatic Download (Recommended)

Option 2: Manual Download

Option 3: Kaggle API

Run Jupyter Notebook for Experiments

✅ Typical Experiment Workflow

What Happens:

Expected Output:

📊 View Experiments with MLflow

Start MLflow UI

Access Dashboard

What You'll See:

📈 Evaluation Metrics

Model Performance Targets

Metrics Calculated

View Results

🔧 Configuration

Main Configuration (config/config.yaml)

Model Configuration (config/model_config.yaml)

📝 Logs

Log Files

Log Format

🧪 Generated Artifacts

After Training:

🎓 Model Training Details

Models Trained:

Training Process:

API & UI (Phase 2)

📦 New Components Added

1. Prediction Pipeline (src/pipeline/prediction_pipeline.py)

2. FastAPI REST API (api/)

3. Streamlit Dashboard (streamlit_app/)

🚀 Quick Start

Step 1: Install New Dependencies

Step 2: Ensure Model is Trained

Step 3: Start the FastAPI Server

Step 4: Start the Streamlit Dashboard (New Terminal)

📡 API Endpoints

1. Health Check

2. Single Prediction

3. Batch Prediction

4. Model Information

5. Feature Importance

🧪 Testing the API

Option 1: Interactive Docs (Recommended)

Option 2: Test Script

Option 3: cURL Commands

Option 4: Python Requests

Containerization (Phase 3)

📦 What's Been Created

Docker Images

Configuration Files

🚀 Quick Start

Step 1: Prerequisites

Step 2: Setup Environment

Step 3: Make Scripts Executable (Linux/Mac)

Step 4: Build Images

Step 5: Run with Docker Compose

Main Configuration (`config/config.yaml`)

Model Configuration (`config/model_config.yaml`)

1. Prediction Pipeline (`src/pipeline/prediction_pipeline.py`)

2. FastAPI REST API (`api/`)

3. Streamlit Dashboard (`streamlit_app/`)

1. Unit Tests (`tests/unit/`)

2. Integration Tests (`tests/integration/`)

3. Data Quality Tests (`tests/data/`)

4. Model Performance Tests (`tests/model/`)

1. CI Pipeline (`ci.yml`)

2. Docker Build & Push (`docker-build.yml`)

3. Code Quality (`code-quality.yml`)