Letterboxd Project Analyzing Movie Preferences

1. Components

data scraping from Letterboxd website, fetching data using The Movie Database (TMDb) API, as well as script data from various websites
exploratory data analysis (EDA) on user movie data
topic modeling on movie scripts

2. Status

On major components:

Specific EDA goals

Runtime histogram
Summary statistics on ratings
Pairwise scatter plots of independent variables
Language distribution + analysis on imbalanced language classes

For ordinal logistic regresion, implement:

verification of ordinality assumption, by plotting mean of predictor X stratified by levels of response Y (using a boxplot)
score residual plots to verify parallelism
partial residual plots to verify linearity and parallelism
Li and Shepherd residual to verify functional form of predictors
verification of PO assumption for each predictor separately (by comparing logits of proportions of form Y >= j)

On minor improvements:

Supplement genre information using Letterboxd website when TMDb API fails
Convert genres information from list of strings to indicator matrix
Put all helper functions into separate Python script that is imported into Python notebook
Improve initialization of TMDb data array to reduce amount of if-else statements in data extraction code
Better colors for graphs + titles + legends
Rewrite genre counts using pandas built-in functions sort_values and reindex
Implement data imputation for missing data

3. Instructions

Some instructions for running the notebook:

Create a virtual environment using the following command in PowerShell

py -m venv directory_name

Initialise the virtual environment using the following command in PowerShell

directory_name\Scripts\Activate.ps1

Run the following command with your virtual environment active to install all required dependencies

py -m pip install -r requirements.txt

Check that the notebook's Python kernel is switched to the virtual environment Python kernel!

For dev work:

Remember to run the following command in your virtual environment (it'll save installed packages used in dev work)

py -m pip freeze > requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
README.md		README.md
dev.ipynb		dev.ipynb
helper.py		helper.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Letterboxd Project Analyzing Movie Preferences

1. Components

2. Status

3. Instructions

About

Uh oh!

Releases

Packages

Uh oh!

Languages

writingindy/Letterboxd_Project

Folders and files

Latest commit

History

Repository files navigation

Letterboxd Project Analyzing Movie Preferences

1. Components

2. Status

3. Instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages