Reddit Classification

This mini-project implements models to analyze text from the website Reddit (https://www.reddit.com/), a popular social media forum where users post and comment on content in different themed communities, or subreddits. The goal of this project is to develop a supervised classification model that can predict what community a comment came from.

To run the project the following libraries must be installed:

numpy
nltk
sklearn
pandas

To run the project using sklearn models use the following command:

python3 main.py

or

./main.py

To run the project using our in-house Bernoulli Naive Bayes use the following command:

python3 bernoulli_naive_bayes.py

NOTE: due to the large number of features used, the main.py script seem to not work properly on Linux platfroms. Running it on Mac should resolve the issue.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data_sources		data_sources
.gitignore		.gitignore
README.md		README.md
bernoulli_naive_bayes.ipynb		bernoulli_naive_bayes.ipynb
bernoulli_naive_bayes.py		bernoulli_naive_bayes.py
contractions.py		contractions.py
csvfile.txt		csvfile.txt
data_preprocessor.py		data_preprocessor.py
data_reader.py		data_reader.py
main.py		main.py
submission.csv		submission.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reddit Classification

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

AntonGladyr/RedditClassification

Folders and files

Latest commit

History

Repository files navigation

Reddit Classification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages