Multilingual-Toxic-Comment-Classification

This project is based on NLP (Natural Language Processing) which enables us to classify comments on the basis on toxicity (anything rude, disrespectful comments). If these toxic contributions can be identified, we could have a safer, more collaborative internet. The contest was primarily organized by "Kaggle". This model was made using BERT Multilingual Library and trained using TPUv3-8 (Tensor Processing Unit). The AUC Score achieved by my model was 0.915

The files available above are the primary code files used to make the machine learning model.

Since the Dataset used in this project is very large in size. So I am providing the link to the entire dataset that I used so that you can download it for using it in your model.

Official Dataset provided by Kaggle during the contest : https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification/data
BERT Multilingual Uncased Dataset (Author - Abhishek Thakur) : https://www.kaggle.com/abhishek/bert-base-multilingual-uncased

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
A_Anti_knapsack.cpp		A_Anti_knapsack.cpp
README.md		README.md
jmtc-1.ipynb		jmtc-1.ipynb
jmtc-final-output.ipynb		jmtc-final-output.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilingual-Toxic-Comment-Classification

About

Uh oh!

Releases

Packages

Languages

shambhavgo/Multilingual-Toxic-Comment-Classification

Folders and files

Latest commit

History

Repository files navigation

Multilingual-Toxic-Comment-Classification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages