Sentiment Analysis - Donald Trump Tweets

Project Overview

This repository contains the coursework and analysis for a Sentiment Analysis project. The primary objective was to perform a comparative study of sentiment analysis techniques applied to Donald Trump's historical Twitter data. The project contrasts a lexicon-based approach (VADER) with various Supervised Machine Learning algorithms to classify the emotional tone of political communication.

Authors

Weronika Mądro
Wojciech Hrycenko

Repository Contents

1. Sentiment Analysis: Donald Trump Tweets

File: Sentiment_Analysis_Madro_Hrycenko.ipynb

Objective The goal of this project was to analyze the sentiment of tweets to:

Apply lexicon-based analysis using VADER to generate sentiment scores (Compound, Positive, Neutral, Negative).
Transform text data into numerical features using TF-IDF.
Train and evaluate supervised models (Classification) using labels derived from VADER scores.
Optimize model hyperparameters to maximize predictive performance.

Dataset The project utilizes the realdonaldtrump.csv dataset, which contains over 43,000 tweets from Donald Trump (up to June 2020). Key analysis was performed on the content column after extensive cleaning.

Methodology

Data Analysis & Preprocessing:
- Text Cleaning: Removal of URLs, user mentions (@user), hashtags (#), punctuation, and numbers; conversion to lowercase.
- Normalization: Tokenization and removal of English stopwords.
- EDA: Frequency analysis of top unigrams (e.g., "realdonaldtrump", "great", "fake news") and visualization of sentiment score distributions.
Lexicon-Based Approach (VADER):
- Utilized SentimentIntensityAnalyzer to compute polarity scores.
- The Compound score was used to label tweets for the supervised learning stage.
Supervised Modeling:
- Feature Extraction: Implemented TfidfVectorizer to convert cleaned text into weighted feature vectors.
- Algorithms: Trained and compared multiple classifiers:
  - Logistic Regression
  - Linear SVM (LinearSVC)
  - Decision Trees
  - K-Nearest Neighbors (KNN)
  - Random Forest
- Optimization: Applied GridSearchCV and RandomizedSearchCV for hyperparameter tuning.
Evaluation:
- Performance measured using Accuracy, ROC AUC, and Confusion Matrices.
- Linear SVM and Logistic Regression demonstrated superior performance compared to non-linear models.

Technologies and Libraries

The project was developed in Python, utilizing the following key libraries:

NLTK: For natural language processing tasks (Stopwords, VADER Sentiment Intensity Analyzer).
Scikit-learn: For machine learning models (LogisticRegression, SVM, RandomForest), feature extraction (TfidfVectorizer), and evaluation metrics.
Pandas & NumPy: For efficient data manipulation and numerical analysis.
Matplotlib & Seaborn: For plotting data distributions and model results.
WordCloud: For visualizing the most frequent terms in the corpus.
Jupyter Notebook: Used as the interactive development environment.

Usage Instructions

Clone this repository to your local machine.
Ensure all required dependencies are installed (refer to the library list above).
Download the realdonaldtrump.csv dataset and place it in the same directory as the notebook.
Navigate to the directory and execute Sentiment_Analysis_Madro_Hrycenko.ipynb to view the data cleaning, VADER scoring, and model training processes.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Class Materials		Class Materials
.DS_Store		.DS_Store
README.md		README.md
Sentiment_Analysis_Madro_Hrycenko.ipynb		Sentiment_Analysis_Madro_Hrycenko.ipynb
Sentiment_Analysis_Presentation_Madro_Hrycenko.pptx		Sentiment_Analysis_Presentation_Madro_Hrycenko.pptx
realdonaldtrump.csv		realdonaldtrump.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis - Donald Trump Tweets

Project Overview

Authors

Repository Contents

1. Sentiment Analysis: Donald Trump Tweets

Technologies and Libraries

Usage Instructions

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

WojciechHrycenko/SentimentAnalysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis - Donald Trump Tweets

Project Overview

Authors

Repository Contents

1. Sentiment Analysis: Donald Trump Tweets

Technologies and Libraries

Usage Instructions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages