This project is a machine learning application that classifies emails as Spam or Not Spam using natural language processing techniques.
It demonstrates the use of text preprocessing, feature extraction (TF-IDF), and classification models.
- Preprocesses raw email text (cleaning, tokenization, stopword removal).
- Converts text into numerical features using TF-IDF vectorization.
- Classifies emails using Logistic Regression.
- Achieves high accuracy on benchmark datasets.
- Easy to extend with other ML models (Naive Bayes, SVM, etc.).
- Language: Python
- Libraries: Pandas, NumPy, Scikit-learn
- Algorithm: TF-IDF + Logistic Regression