Mathematics for Computer Science Homework: Statistical Analysis of Obesity Levels Dataset

For my course of Mathematics for Computer Science, it was needed to implement our theoretical knowledge into code (Python). For this report, we used a dataset from a paper called: "Estimation of Obesity Levels Based On Eating Habits and Physical Condition", which include a dataset (synthetic data) of individuals from countries of Mexico, Peru and Colombia. Contains 17 features and 2111 records labeled with the class variable divided by 4 types:

Insufficient Weight
Normal Weight
Overweight (Level I, Level II)
Obesity Type I, Type III and Type III

Specific tasks

This assignment was really complete in terms of methods used, explanation of code cells and compariosn of results using different functions in needed cases. Developed on Jupyter Notebooks using Google Colab with libraries such as Numpy, SciPy, Pandas and StatsModels. There was a division of every features in 3 types:

Continuous
Integer (Discrete)
Binary
Categorical

Based on the datatype, it was possible to do a separation of statistical methods for better results. Some methods being:

Probability Mass Function (PMF)
Probability Density Function (PDF)
Cumulative Distribution Function (CDF)
Analysis of Variance (ANOVA)
False DIscovery Rate (FDR)
Bootstrap

Question and Goals

Based on the information from the dataset, we can agree for the research question to be: "How do eating habits and physical activity patterns influence obesity levels among individuals?".

From this general research question and based on the main 6 tasks related to statistical analysis, there are certain general objectives to complete:

Describe the distributional behavior of all features, conditioned on obesity levels (target).
Statistically assess how eating habits and physical activity differ across obesity levels using hypothesis testing for continuous, binary and categorical variables.
Evaluate the discriminative ability of each feature by identifying which behaviors and physical activity patterns best predict obesity levels.

All the needed solutions, graphs and comparison can be seen on the file annotated-Statistics.pdf. For the coding part, there's a .ipynb file which contains all the code cells needed 😀.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ObesityDataSet_raw_and_data_sinthetic.csv		ObesityDataSet_raw_and_data_sinthetic.csv
README.md		README.md
annotated-Statistics.pdf		annotated-Statistics.pdf
math.ipynb		math.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mathematics for Computer Science Homework: Statistical Analysis of Obesity Levels Dataset

Specific tasks

Question and Goals

About

Uh oh!

Releases

Packages

Languages

camilacastano/statistical-analysis-obesity-levels

Folders and files

Latest commit

History

Repository files navigation

Mathematics for Computer Science Homework: Statistical Analysis of Obesity Levels Dataset

Specific tasks

Question and Goals

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages