Spam Email Classification System

A production-grade machine learning system designed to robustly classify emails as "Spam" or "Ham" (legitimate). This project features a modular pipeline architecture for training and inference, integrated with a modern Streamlit user interface for easy interaction.

🚀 Key Features

Advanced ML Pipeline: Modular design separating data ingestion, transformation, and model training.
Multiple Model Support: evaluation of various algorithms including SVM, Logistic Regression, Decision Trees, and Random Forest.
Interactive Web UI: Built with Streamlit for real-time single-email analysis and batch processing.
MBOX Support: Native capability to process and classify entire mbox email archives.
Detailed Analytics: Comprehensive logging and performance metrics (Precision, Recall, F1-Score).

🛠️ Tech Stack

Language: Python 3.10+
Frontend: Streamlit
ML Framework: Scikit-learn
Data Processing: Pandas, NumPy, BeautifulSoup4
Project Management: uv (recommended) or pip

📂 Project Structure

├── app.py                  # Main Streamlit Web Application
├── requirements.txt        # Project dependencies
├── main.py                 # (Optional) Alternative entry point
├── src/
│   ├── components/         # Core processing modules (Ingestion, Transformation)
│   ├── pipeline/           # Orchestration pipelines (Training, Prediction)
│   ├── config/             # Configuration and parameters
│   └── utils/              # Helper functions, logging, and state management
├── data/                   # Dataset storage (inputs)
├── outputs/                # Training artifacts (models, vectorizers)
└── logs/                   # System runtime logs

⚡ Installation

Clone the Repository

git clone <repository_url>
cd Spam-Email-Detection

Set up Environment It is recommended to use a virtual environment.

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install Dependencies
```
pip install -r requirements.txt
```

🖥️ Usage

1. Running the Web Application

Launch the interactive dashboard to classify emails instantly.

streamlit run app.py

Single Email Tab: Paste email content to get an immediate Spam/Ham prediction with a confidence score.
Batch Processing Tab: Upload an .mbox file to process multiple emails at once and download the results as a CSV.

2. Training the Model

(Optional) If you wish to retrain the models on new data:

Place your dataset in data/dataset/dataset.csv.

Run the training pipeline:

python -m src.pipeline.training_pipeline

Artifacts (Model & Vectorizer) will be saved in the outputs/ directory.
Important: Update src/config/config.py with the new paths to your generated model and vectorizer if they change.

⚙️ Configuration

The system is highly configurable via src/config/config.py. You can adjust:

Model hyperparameters (Grid Search configuration)
Input/Output paths
Training parameters (Cross-validation folds, etc.)

📊 Model Performance

The pipeline automatically evaluates models using 5-fold cross-validation. Metrics including Accuracy, Precision, Recall, and F1-Score are logged for each experiment. By default, the system selects the best performing model (often SVM or Random Forest) for inference.

🤝 Contributing

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
Notebook Experiments		Notebook Experiments
data		data
outputs		outputs
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spam Email Classification System

🚀 Key Features

🛠️ Tech Stack

📂 Project Structure

⚡ Installation

🖥️ Usage

1. Running the Web Application

2. Training the Model

⚙️ Configuration

📊 Model Performance

🤝 Contributing

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

KalyanM45/Spam-Email-Detection

Folders and files

Latest commit

History

Repository files navigation

Spam Email Classification System

🚀 Key Features

🛠️ Tech Stack

📂 Project Structure

⚡ Installation

🖥️ Usage

1. Running the Web Application

2. Training the Model

⚙️ Configuration

📊 Model Performance

🤝 Contributing

📝 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages