Skip to content

A Machine learning model that uses collaborative filtering to recommend anime to users, written in python using PyTorch, NumPy and Pandas.

Notifications You must be signed in to change notification settings

greenfish8090/Anime-Recommender-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Anime Recommender System

A Machine learning model that uses collaborative filtering to generate personalized recommendations to users, written in python using PyTorch, NumPy and Pandas. To learn more about the theory, check out the project page on my website!

Dataset and preprocessing

The dataset used was a subset of the MyAnimeList dataset which contains around 80 million ratings of 14k anime by 300k users. The 5,000 most popular anime and 100,000 randomly sampled users who rated at least 50 anime were taken as the "cleaned up" dataset with which latent features were trained on.
Dask was used to process the relatively large dataset in a distributed way.
You can find the code for this in MAL Dataset/Dataset prep.ipynb.

Training

Matrix factorization is at the heart of this algorithm. The 5,000 x 100,000 dimensional sparse ratings matrix is decomposed into a 5,000 x 10 dimensional 'anime_matrix' and a 10 x 100,000 dimensional 'user_matrix'. These matrices are initialized at random, and are iteratively updated to converge them such that their product matches the original ratings matrix. With this, we will have essentially "filled in" the missing ratings.
PyTorch was used in order to leverage the GPU and speed up training significantly. I also used autograd and an optimizer from PyTorch because why not
Walkthrough and code can be found in Training.ipynb

Prediction

When we need to recommend anime to a user that wasn't part of the 100,000 trained users, we fetch their profile from MyAnimeList using jikanpy, train their specific 10 x 1 vector and then use that to predict their ratings.
The code for this is in Predict.ipynb

Things to try out

I don't think I'll work on it anytime soon but if someone wants to take the project further, these are some first steps:

  • Add content based recommendation
  • Figure out some way to get new anime ratings without killing the MAL api :P
  • Understand and implement some of the best performing submissions from the Netflix Prize contest

Thanks for reading!

About

A Machine learning model that uses collaborative filtering to recommend anime to users, written in python using PyTorch, NumPy and Pandas.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published