This project is based on NLP (Natural Language Processing) which enables us to classify comments on the basis on toxicity (anything rude, disrespectful comments). If these toxic contributions can be identified, we could have a safer, more collaborative internet. The contest was primarily organized by "Kaggle". This model was made using BERT Multilingual Library and trained using TPUv3-8 (Tensor Processing Unit). The AUC Score achieved by my model was 0.915
The files available above are the primary code files used to make the machine learning model.
Since the Dataset used in this project is very large in size. So I am providing the link to the entire dataset that I used so that you can download it for using it in your model.
- Official Dataset provided by Kaggle during the contest : https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification/data
- BERT Multilingual Uncased Dataset (Author - Abhishek Thakur) : https://www.kaggle.com/abhishek/bert-base-multilingual-uncased