Sentiment Analysis of Tweets regarding Technology and Innovation in the UAE

This project was an assignment for our Text Analytics course @ Heriot Watt University. It looks at tweets in order to gain insights into people's views on the current state of technology, the innovations, and technological advancements made in the UAE.

Motivation

Sentiments from Twitter could be useful for businesses and government entities.

Method and results

Methods

The standard NLP workflow is abided by. What was done in each step of the workflow is briefly described below:

Text-Preprocessing. Related tweets are scraped using Tweepy.
Data Labelling. TextBlob and Vader are used to automate labelling of tweets as positive, negative, or neutral. They are then manually examined to determine which would be more accurate.
Data pre-processing. Standard pre-processing- stop words are removed, tweets are tokenized, lemmatization/stemming, ... . Lemmatization and stemming and compared against a classifier, and the best one is used in the next step.
Text Representation. The following text representation techniques were used and compared against each other: BOW, N-grams, TF-IDF, CBOW, Skip-gram, and the pre-trained W2V model. The classifier used for comparison was the Multinomial Naive Bayes classifier, with the main metrics of comaparison precision, recall, and f1 due to class imbalance.
Classification. The best text representation models were used in a variety of classification models: LR, Facebook's FastText, and Decision Trees.
Evaluation, Visualization, and Insights. The precision, recall, and F1 of the models are compared. Then, positive, negative and neutral tweets resulting from the best classifier are inspected.

Results

The best performing pipeline consisted of lemmatization for the normalization of tweets, N-gram range 1 to 6 for text representation, and decision tree for classifiation, resulting in weighted prec and recall of .85 and weighted F1 of .84.

Libraries Used

Tweepy Vader TextBlob SkLearn NLTK Numpy Pandas Gensim FastText MatPlotLib PyLDAvis

Running instructions

A Twitter Developer account will have to be created in order to retrieve authentication keys to scrape tweets.

To run the project, simply fork it, plug in the authentication credentials from your the Twitter Dev account into the 'Scraping Tweets' section and run the notebook.

About

Contributors: Alora Tabuco, Bhavika Kaliya, Andrea Nabua

Task Distribution:

Member	Tasks
Bhavika	Data Cleaning and Preprocessing, Logistic Regression, Visualization
Alora	Data Labelling and Cleaning, FastText, Decision Trees and Conclusions
Andrea	Project Overview, Tweet Scraping, Text Representation

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
ConflictingLabels.csv		ConflictingLabels.csv
README.md		README.md
cw-1.ipynb		cw-1.ipynb
cw1-specs.docx		cw1-specs.docx
dataset.csv		dataset.csv
test.txt		test.txt
train.txt		train.txt
updated-dataset.csv		updated-dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis of Tweets regarding Technology and Innovation in the UAE

Motivation

Method and results

Methods

Results

Libraries Used

Running instructions

About

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

psychoanalyzing-twitter/sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis of Tweets regarding Technology and Innovation in the UAE

Motivation

Method and results

Methods

Results

Libraries Used

Running instructions

About

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages