Skip to content

aishrock006/Capstone-Project

Repository files navigation

Machine Learning Engineer Nanodegree 2020

Capstone Project

Time Series Analysis of Air Quality Data


Install
This project requires Python 3 and the following Python libraries installed:


* fbprophet
* matplotlib
* statsmodels
* NumPy
* Pandas
* scikit-learn

You will also need to have software installed to run and execute an iPython Notebook
We recommend to install Anaconda, a pre-packaged Python distribution that contains all of the necessary libraries and software for this project.

Code
Complete notebook is divided into various parts:
* Part 1: 1Data Preprocessing
* Part 2: 2Data Modelling with ARIMA
* Part 3: 3Data Modelling with FBPhrophet
* Part 4: 4Data Modelling with Holt Winter
* Part 5: 5Training with LSTM
* Part 6: 6Final Model Evaluation and Prediction.

Data
The Dataset consists of Air Quality data from 12 sites in beijing. The main pollutants are PM2.5, PM10, SO2, NO2, CO, O3
Orignal raw data(of 12 multi-sites) is available in /beijing-multisite-airquality-data-set/ directory

* Best selected dataset is available in /dataset/
* Processed daily Train data is available in /dataset/daily/Train
* Processed daily Test data is available in /dataset/daily/Train
* Processed monthly Train data is available in /dataset/monthly/Train
* Processed monthly Train data is available in /dataset/monthly/Train
* Evaluation metrics Results is available in /result