The goal of this project is to be able to predict the topic of different research articles from their title and abstract. The six possible topics for the articles are Computer Science, Physics, Mathematics, Statistics, Quantitative Biology, and Quantitative Finance. In addition, we will analyze the 10 most frequent words in each of these topics. The link to the dataset is https://www.kaggle.com/vetrirah/janatahack-independence-day-2020-ml-hackathon.
The code can be executed simply by opening it in an IDE for R such as RStudio and running it.