Skip to content

brendontan03/SC1015

Repository files navigation

Suicide Prevention Squad

About

This is a mini-project for SC1015 (Introduction to Data Science and Artificial Intelligence). We will be researching on the possible factors that affects suicide rates and creating some models to determine which factor is the most significant. Based on the insights from the exploratory data analysis and machine learning models, we will suggest some policies to target specific factors in hopes of reducing suicide rates around the world. For a detailed walkthrough, please view the source code in the following order:

  1. Data Cleaning & Exploratory Data Analysis
  2. Machine Learning models
  3. Policies

Group Members

  • Brendon Tan (Data Cleaning and Exploratory Data Analysis)
  • Gerard Sin (Machine Learning)
  • Eldrick Goh (Policies and Presentation Narration)

Additionally, we would like to thank Dr Sourav Sen Gupta and TA Sun Chenyu for their help and guidance in our data science journey.

Problem Definition

  • Are we able to predict which factor has the most significant impact on suicide rates?
  • Which model would be the best to predict it?
  • Which policies are best suited to reduce suicide rates?

Datasets Used

Models Used

  1. Linear Regression
  2. Gradient Boosting
  3. Logistic Regression

Conclusion

  • Not all data in the datasets are relevant, thus there is a need to remove and clean the data to make it into the desired form.
  • Data found in the main dataset is not sufficient for our problem, hence we sourced for more datasets to supplement the main dataset.
  • Linear regression's accuracy is too low, might be due to insufficient data.
  • Gradient boosting has a much higher accuracy for training data compared to test data, meaning the model is overfitting the data.
  • Logistic regression is most reliable out of the 3 models used.
  • Education Index is the most significant factor affecting suicide rates based on our logistic regression model
  • Policies will be focusing on how to improve education index of countries

What we learnt from the project

  • Formatting and cleaning datasets
  • Pyplot for Exploratory Data Analysis
  • Logistic Regression from Scikit learn
  • Gradient Boosting
  • Organising code and data on GitHub Repository as well as how to write README

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published