This track will allow you to explore interesting data sets using machine learning and other data science techniques. The goal is simple – find something interesting in your data and present it in a compelling manner. Projects with a social impact are especially encouraged.
The goal of this track is to use machine learning and data science techniques in a creative way to discover meaningful and impactful insights from data. You can focus on any aspect of machine learning or data science and present data exploration results, modeling results, or even develop a new data visualization. You should work with a new data set that you have never explored before.
For the presentation, you should present the question you are addressing, describe the data used, discuss the technical approaches employed, and clearly present your findings along with the significance and impact of your results. You should also be prepared to show your code; we strongly encourage using Jupyter notebooks or R markdown.
This track will be judged based on creativity, technical data science approach, significance and impact of findings, and presentation quality.
Here are some links to publicly available data repositories:
- Urban Data Platform - Great data on the city of Houston
- Kaggle - A variety of machine learning datasets and competitions
- UC Irvine ML Repository - Many datasets commonly used for machine learning tasks
- Network Repository - Great repository of network data (e.g. social networks, bio-networks, etc.)
- Creativity.
- Technical data science or machine learning approach.
- Significance and impact of findings. (Is there a social good aspect?)
- Presentation quality.
Huge thanks to Genevera Allen and the D2K Lab for contributing this track!