Skip to content

Application of statistical methods for the classification of features using a dataset from UC Irvine Machine Learning Repository.

Notifications You must be signed in to change notification settings

camilacastano/statistical-analysis-obesity-levels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Mathematics for Computer Science Homework: Statistical Analysis of Obesity Levels Dataset

For my course of Mathematics for Computer Science, it was needed to implement our theoretical knowledge into code (Python). For this report, we used a dataset from a paper called: "Estimation of Obesity Levels Based On Eating Habits and Physical Condition", which include a dataset (synthetic data) of individuals from countries of Mexico, Peru and Colombia. Contains 17 features and 2111 records labeled with the class variable divided by 4 types:

  • Insufficient Weight
  • Normal Weight
  • Overweight (Level I, Level II)
  • Obesity Type I, Type III and Type III

Specific tasks

This assignment was really complete in terms of methods used, explanation of code cells and compariosn of results using different functions in needed cases. Developed on Jupyter Notebooks using Google Colab with libraries such as Numpy, SciPy, Pandas and StatsModels. There was a division of every features in 3 types:

  • Continuous
  • Integer (Discrete)
  • Binary
  • Categorical

Based on the datatype, it was possible to do a separation of statistical methods for better results. Some methods being:

  • Probability Mass Function (PMF)
  • Probability Density Function (PDF)
  • Cumulative Distribution Function (CDF)
  • Analysis of Variance (ANOVA)
  • False DIscovery Rate (FDR)
  • Bootstrap

Question and Goals

Based on the information from the dataset, we can agree for the research question to be: "How do eating habits and physical activity patterns influence obesity levels among individuals?".

From this general research question and based on the main 6 tasks related to statistical analysis, there are certain general objectives to complete:

  • Describe the distributional behavior of all features, conditioned on obesity levels (target).
  • Statistically assess how eating habits and physical activity differ across obesity levels using hypothesis testing for continuous, binary and categorical variables.
  • Evaluate the discriminative ability of each feature by identifying which behaviors and physical activity patterns best predict obesity levels.

All the needed solutions, graphs and comparison can be seen on the file annotated-Statistics.pdf. For the coding part, there's a .ipynb file which contains all the code cells needed 😀.

About

Application of statistical methods for the classification of features using a dataset from UC Irvine Machine Learning Repository.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published