Skip to content

Production-ready sentiment analysis system using classical NLP (TF-IDF + Logistic Regression) with FastAPI and Streamlit. Part of my pre-transformer NLP project series.

Notifications You must be signed in to change notification settings

Tanish-Sarkar/Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Python NLP Scikit--Learn FastAPI Streamlit License Status

๐ŸŽฌ Sentiment Analysis โ€“ Classic NLP Pipeline

A production-ready sentiment analysis system built using classical NLP techniques on the IMDB Movie Reviews dataset.
This project demonstrates an end-to-end NLP workflow โ€” from raw text preprocessing to model training, evaluation, and real-time inference via API and UI.

โš ๏ธ No transformers were used.
This project intentionally focuses on strong NLP fundamentals before moving to modern LLM-based systems.


UI LOOK

img

---

๐Ÿš€ Project Overview

The system classifies movie reviews as Positive or Negative using a TF-IDF + Logistic Regression pipeline and exposes predictions through:

  • โœ… A FastAPI REST API
  • โœ… An Streamlit interactive UI

This project is part of my Pre-Transformer NLP Project Series, designed to build deep intuition for text pipelines and production ML systems.


๐ŸŽฏ Objectives

  • Build a classical NLP pipeline from scratch
  • Perform robust text preprocessing
  • Extract features using TF-IDF
  • Train and evaluate a machine-learning classifier
  • Persist and reload model artifacts safely
  • Serve predictions via a REST API
  • (Optional) Provide a human-friendly UI

๐Ÿง  System Design & Approach

๐Ÿ”น Text Preprocessing

  • Lowercasing
  • HTML tag removal
  • URL removal
  • Punctuation & digit removal
  • Stopword removal (NLTK)
  • Stemming (Porter Stemmer)

๐Ÿ”น Feature Extraction

  • TF-IDF Vectorization
  • Unigrams + Bigrams
  • Feature cap for efficiency & generalization

๐Ÿ”น Model

  • Logistic Regression (binary classification)
  • Probability-based confidence scores
  • Lightweight, fast, and interpretable

๐Ÿ“‚ Project Structure


project-sentiment/
โ”‚
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ imdb.csv
โ”‚   โ””โ”€โ”€ imdb_clean.csv
โ”‚
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ sentiment_model.joblib
โ”‚   โ””โ”€โ”€ vectorizer.joblib
โ”‚
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ preprocess.py
โ”‚   โ”œโ”€โ”€ train.py
โ”‚   โ””โ”€โ”€ predict.py
โ”‚
โ”œโ”€โ”€ app.py              # Streamlit UI
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md


๐Ÿ“ฆ Installation

1๏ธโƒฃ Create virtual environment

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate

2๏ธโƒฃ Install dependencies

pip install -r requirements.txt

๐Ÿ‹๏ธ Training the Model

Ensure data/imdb_clean.csv exists.

python -m src.train

This will:

  • Train the sentiment classifier

  • Evaluate performance

  • Save model artifacts:

    • models/sentiment_model.joblib
    • models/vectorizer.joblib

๐Ÿ“Š Evaluation Metrics (Sample)

Accuracy: ~0.89
F1 Score: ~0.89
Confusion Matrix:
[[TN FP]
 [FN TP]]

(Exact values may vary slightly due to randomness.)


โšก API Inference (FastAPI)

Start the server

uvicorn src.predict:app --reload

Open Swagger UI:

http://127.0.0.1:8000/docs

Example CURL Request

curl -X POST "http://127.0.0.1:8000/predict" \
-H "Content-Type: application/json" \
-d "{\"text\": \"This movie was absolutely fantastic\"}"

Example Response

{
  "sentiment": "positive",
  "confidence": 0.97
}

Example String

img

#

Example Strin Outputg

img

---

๐Ÿ–ฅ๏ธ Interactive UI (Streamlit)

Run:

streamlit run app.py

Provides a simple UI to test predictions interactively.


๐Ÿงช Example Predictions

Review Prediction
Amazing acting and storyline Positive
Boring movie, waste of time Negative

๐Ÿงฐ Tech Stack

  • Python
  • NLTK
  • scikit-learn
  • FastAPI
  • Uvicorn
  • Streamlit
  • Pandas / NumPy

๐Ÿง  Learning Outcomes

  • Built a production-style NLP system
  • Understood classical NLP pipelines end-to-end
  • Learned artifact management & safe loading
  • Deployed ML inference via REST API
  • Created a user-facing ML demo UI

๐Ÿ”ฎ Future Improvements

  • Compare TF-IDF vs Word2Vec / GloVe
  • Replace Logistic Regression with LightGBM
  • Add batch inference & logging
  • Upgrade to Transformer-based model
  • Deploy to cloud (Render / Hugging Face Spaces)

๐Ÿ‘ค Author

Tanish Sarkar Pre-Transformer NLP Projects

About

Production-ready sentiment analysis system using classical NLP (TF-IDF + Logistic Regression) with FastAPI and Streamlit. Part of my pre-transformer NLP project series.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages