Skip to content

racheltgunawan/Tweet-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Tweet-Classifier project using pandas and vectorizers

Overview

This project was created so that I could know more about AI and Machine Learning models. Following my technica project regarding the origins of hate speech, I wanted to see if I could create my own dataset regarding classifying hate speech. I took an already classified datasets from the internet, and trained a Machine Learning model so that I can recreate the algorithm on a new dataset.

Methods

  1. Clean the data. I removed any punctuation.
  2. Put the tweets into a "bag of words."
    • This bag of words was a count vectorizer which counts the amount of times a word appears in a sentence. If the tweet is classified as a hate tweet, then the model will think that the more times that word appears in a tweet, it will most likely be associated with hate comments.
  3. Train the model.
    • Imported a logistic regression model

Results

The article that assisted me with this project achieved an 89% accuracy on their model, while my project ended up achieving a 96% accuracy. This is due to the fact that I added "stop words" into the bag of words. Getting rid of stop words will allow the model to achieve a higher accuracy because it disregards words that are unnecessary such as "and", "the", "this", etc.

Screen Shot 2023-08-16 at 5 53 33 PM

Challenges

I had never worked with machine learning models before and this was my first interaction with ML. Because of my lack of prior knowledge for ML and its algorithms, it took me a while to understand how the models work. However, after much researching and reading into documentation, I was able to complete the project with a higher accuracy than I anticipated.

About

Classifies tweets as either neutral or hate

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published