🎬 Sentiment Analysis – Classic NLP Pipeline

A production-ready sentiment analysis system built using classical NLP techniques on the IMDB Movie Reviews dataset.
This project demonstrates an end-to-end NLP workflow — from raw text preprocessing to model training, evaluation, and real-time inference via API and UI.

⚠️ No transformers were used.
This project intentionally focuses on strong NLP fundamentals before moving to modern LLM-based systems.

UI LOOK

---

🚀 Project Overview

The system classifies movie reviews as Positive or Negative using a TF-IDF + Logistic Regression pipeline and exposes predictions through:

✅ A FastAPI REST API
✅ An Streamlit interactive UI

This project is part of my Pre-Transformer NLP Project Series, designed to build deep intuition for text pipelines and production ML systems.

🎯 Objectives

Build a classical NLP pipeline from scratch
Perform robust text preprocessing
Extract features using TF-IDF
Train and evaluate a machine-learning classifier
Persist and reload model artifacts safely
Serve predictions via a REST API
(Optional) Provide a human-friendly UI

🧠 System Design & Approach

🔹 Text Preprocessing

Lowercasing
HTML tag removal
URL removal
Punctuation & digit removal
Stopword removal (NLTK)
Stemming (Porter Stemmer)

🔹 Feature Extraction

TF-IDF Vectorization
Unigrams + Bigrams
Feature cap for efficiency & generalization

🔹 Model

Logistic Regression (binary classification)
Probability-based confidence scores
Lightweight, fast, and interpretable

📂 Project Structure


project-sentiment/
│
├── data/
│   ├── imdb.csv
│   └── imdb_clean.csv
│
├── models/
│   ├── sentiment_model.joblib
│   └── vectorizer.joblib
│
├── src/
│   ├── preprocess.py
│   ├── train.py
│   └── predict.py
│
├── app.py              # Streamlit UI
├── requirements.txt
└── README.md

📦 Installation

1️⃣ Create virtual environment

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate

2️⃣ Install dependencies

pip install -r requirements.txt

🏋️ Training the Model

Ensure data/imdb_clean.csv exists.

python -m src.train

This will:

Train the sentiment classifier
Evaluate performance
Save model artifacts:
- models/sentiment_model.joblib
- models/vectorizer.joblib

📊 Evaluation Metrics (Sample)

Accuracy: ~0.89
F1 Score: ~0.89
Confusion Matrix:
[[TN FP]
 [FN TP]]

(Exact values may vary slightly due to randomness.)

⚡ API Inference (FastAPI)

Start the server

uvicorn src.predict:app --reload

Open Swagger UI:

http://127.0.0.1:8000/docs

Example CURL Request

curl -X POST "http://127.0.0.1:8000/predict" \
-H "Content-Type: application/json" \
-d "{\"text\": \"This movie was absolutely fantastic\"}"

Example Response

{
  "sentiment": "positive",
  "confidence": 0.97
}

Example String

#

Example Strin Outputg

---

🖥️ Interactive UI (Streamlit)

Run:

streamlit run app.py

Provides a simple UI to test predictions interactively.

🧪 Example Predictions

Review	Prediction
Amazing acting and storyline	Positive
Boring movie, waste of time	Negative

🧰 Tech Stack

Python
NLTK
scikit-learn
FastAPI
Uvicorn
Streamlit
Pandas / NumPy

🧠 Learning Outcomes

Built a production-style NLP system
Understood classical NLP pipelines end-to-end
Learned artifact management & safe loading
Deployed ML inference via REST API
Created a user-facing ML demo UI

🔮 Future Improvements

Compare TF-IDF vs Word2Vec / GloVe
Replace Logistic Regression with LightGBM
Add batch inference & logging
Upgrade to Transformer-based model
Deploy to cloud (Render / Hugging Face Spaces)

👤 Author

Tanish Sarkar Pre-Transformer NLP Projects

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎬 Sentiment Analysis – Classic NLP Pipeline

UI LOOK

🚀 Project Overview

🎯 Objectives

🧠 System Design & Approach

🔹 Text Preprocessing

🔹 Feature Extraction

🔹 Model

📂 Project Structure

📦 Installation

1️⃣ Create virtual environment

2️⃣ Install dependencies

🏋️ Training the Model

📊 Evaluation Metrics (Sample)

⚡ API Inference (FastAPI)

Start the server

Example CURL Request

Example Response

Example String

Example Strin Outputg

🖥️ Interactive UI (Streamlit)

🧪 Example Predictions

🧰 Tech Stack

🧠 Learning Outcomes

🔮 Future Improvements

👤 Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
models		models
outputs		outputs
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Tanish-Sarkar/Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

🎬 Sentiment Analysis – Classic NLP Pipeline

UI LOOK

🚀 Project Overview

🎯 Objectives

🧠 System Design & Approach

🔹 Text Preprocessing

🔹 Feature Extraction

🔹 Model

📂 Project Structure

📦 Installation

1️⃣ Create virtual environment

2️⃣ Install dependencies

🏋️ Training the Model

📊 Evaluation Metrics (Sample)

⚡ API Inference (FastAPI)

Start the server

Example CURL Request

Example Response

Example String

Example Strin Outputg

🖥️ Interactive UI (Streamlit)

🧪 Example Predictions

🧰 Tech Stack

🧠 Learning Outcomes

🔮 Future Improvements

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages