Natural Language Processing
Dates: 2018 Jan 7 to 2018 Feb 4
in partnership with: Portland Data Science Group
Toxic comment data from Wulczyn, Ellery; Thain, Nithum; Dixon, Lucas (2016): Wikipedia Detox.
Warning: some comments in this data set contain offensive language and content. The project's goal is to flag for offensive behavior by predicting toxicity through automated NLP.
libraries used:
-
NumPy,Pandas -
sklearnSciKit Learn -
nltkNatural Language Toolkit -
Matplotlibfor visualization -
Kerasfor Deep Learning neural networks
Note 1: some, if not all, data files are found in this repository.
Note 2: All data may be found in this site: http://dive-into.info/ (Note: In my Windows Chrome browser, it was necessary to access this site incognito.)