Course Review Sentiment Analysis

This project analyzes user reviews for online courses to classify them into 'good' or 'bad' sentiment. The analysis is performed using a Jupyter Notebook (Main.ipynb) and involves data cleaning, exploratory data analysis (EDA), and building a machine learning model for sentiment classification.

Datasets

The project uses two CSV files:

user_courses_review_09_2023.csv: The primary dataset containing user reviews from September 2023. It includes the following columns: course_name, lecture_name, review_rating, and review_comment.
user_courses_review_test_set.csv: A separate dataset used for validating the final model. It has the same structure as the primary dataset.

Requirements

To run the analysis in this repository, you will need Python 3 and the following libraries:

pandas
matplotlib
seaborn
scikit-learn
imbalanced-learn (imblearn)

You can install these dependencies using pip:

pip install pandas matplotlib seaborn scikit-learn imbalanced-learn jupyter

How to Run

To run the project, you need to have a Jupyter Notebook environment.

Clone this repository to your local machine.
Navigate to the repository's directory in your terminal.
Start Jupyter Notebook by running:
```
jupyter notebook
```
From the Jupyter interface in your browser, open the Main.ipynb file.
You can run the notebook cells sequentially to see the entire analysis process.

Methodology

The project follows these steps:

Data Cleaning and Preparation: The initial dataset is loaded, and missing values are removed. The review ratings are converted to a numeric format, and the review comments are cleaned by converting them to lowercase and removing punctuation.
Exploratory Data Analysis (EDA): The distribution of review ratings is analyzed, revealing a significant class imbalance with a majority of 5-star ratings. The relationship between comment length and review rating is also explored.
Modeling:
- Initial Approach (Multi-class Classification): An initial attempt to classify reviews into one of five ratings (1-5) using a Multinomial Naive Bayes model was unsuccessful due to the severe class imbalance. The model tended to predict the majority class (5-star rating) for most reviews.
- Binary Classification: The problem was simplified to a binary classification task by categorizing ratings of 4 or 5 as 'good' and ratings of 1 to 3 as 'bad'.
- Handling Class Imbalance: The binary classification model still faced a class imbalance. To address this, the minority class ('bad' reviews) was oversampled to match the number of 'good' reviews. This balanced the dataset and significantly improved the model's performance.
Model Testing: The final model, a Multinomial Naive Bayes classifier trained on the upsampled binary data, was tested on the user_courses_review_test_set.csv to evaluate its performance on unseen data.

Results

The final model achieved an accuracy of approximately 83% on the validation dataset. This demonstrates its effectiveness in classifying course reviews into 'good' or 'bad' sentiment, even with the challenges of a highly imbalanced original dataset. The confusion matrix from the validation test shows that the model performs well, particularly in identifying 'good' reviews, while also being able to correctly classify a majority of the 'bad' reviews.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Main.ipynb		Main.ipynb
README.md		README.md
user_courses_review_09_2023.csv		user_courses_review_09_2023.csv
user_courses_review_test_set.csv		user_courses_review_test_set.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Course Review Sentiment Analysis

Datasets

Requirements

How to Run

Methodology

Results

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Lakes41/Course-Reviews-Sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Course Review Sentiment Analysis

Datasets

Requirements

How to Run

Methodology

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages