Skip to content

This project is based on NLP (Natural Language Processing) which enables us to classify comments on the basis on toxicity (anything rude, disrespectful comments). If these toxic contributions can be identified, we could have a safer, more collaborative internet. The contest was primarily organized by "Kaggle". This model was made using BERT Mult…

Notifications You must be signed in to change notification settings

shambhavgo/Multilingual-Toxic-Comment-Classification

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multilingual-Toxic-Comment-Classification

This project is based on NLP (Natural Language Processing) which enables us to classify comments on the basis on toxicity (anything rude, disrespectful comments). If these toxic contributions can be identified, we could have a safer, more collaborative internet. The contest was primarily organized by "Kaggle". This model was made using BERT Multilingual Library and trained using TPUv3-8 (Tensor Processing Unit). The AUC Score achieved by my model was 0.915

The files available above are the primary code files used to make the machine learning model.

Since the Dataset used in this project is very large in size. So I am providing the link to the entire dataset that I used so that you can download it for using it in your model.

  1. Official Dataset provided by Kaggle during the contest : https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification/data
  2. BERT Multilingual Uncased Dataset (Author - Abhishek Thakur) : https://www.kaggle.com/abhishek/bert-base-multilingual-uncased

About

This project is based on NLP (Natural Language Processing) which enables us to classify comments on the basis on toxicity (anything rude, disrespectful comments). If these toxic contributions can be identified, we could have a safer, more collaborative internet. The contest was primarily organized by "Kaggle". This model was made using BERT Mult…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.5%
  • C++ 0.5%