Data Science portfolio

Artificial Neural Networks

The goal of this project is to focus on building an Artificial Neural Network recognizing objects on images made by the webcam.

Throughout this project I implemented a Feed-Forward Neural Network and Backpropagation from Scratch on make moons dataset, a Convolutional Neural Network with Keras on MNIST dataset and finally I classified the images made with webcam with Pre-trained Networks(VGG16, MobileNet2).

To Do:

Update and clean the notebooks
Explore further the hyperparameters of the networks

Markov Chain-Monte Carlo (MCMC) Simulation

In this project, I teamed up with my colleague Moritz von Ketelhodt to write an program simulating and predicting customer behaviour between departments/aisles in a supermarket, applying Markov Chain modeling and Monte-Carlo simulation.

The project involved the following tasks:

Extending the simulation to multiple customers

See the simulation/one_script.py.

To Do:

Visualization of the supermarket layout and the simulation of the customer behaviour based on the transition probabilities
Displaying the avatars at the exit location
Displaing path of the avatars' move between the locations

Supervised Machine Learning: Classification - Kaggle's Titanic Challenge

This project approaches a classic Machine Learning problem, with a classication model to predict the survival of Titanic passenger based on the features in the dataset of Kaggle's Titanic - Machine Learning from Disaster.

Based on the Exploratory Data Analysis (plotted missing values and the correlation between survival and the different data categories) selected the most significant features and dropped the ones which cannot contribute to accurate prediction.

The data was trained on Scikit-learn's LogisticRegression and RandomForestClassifier models.

Data source: Kaggle: Titanic - Machine Learning from Disaster.

Supervised Machine Learning: Regression - Bicycle Rental Forecast

The goal of this project is to build a regression model, in order to predict the total number of rented bycicles in each hour based on time and weather features, optimizing the accuracy of the model for RMSLE, using Kaggle's "Bike Sharing Demand" dataset that provides hourly rental data spanning two years.

After extracting datetime features, highly correlated variables were dropped via feature selection (correlation analysis, Variance Inflation Factor) to avoid multicollienarity. I compared more linear regression models with one another (PossionRegressor, PolinomialFeatures, Lasso, Ridge, RandomForestRegressor) based on R2 and RMSLE scores.

Data source: Kaggle: Bike Sharing Demand.

Natural Language Processing (NLP): Text Classification

The main goal of this project was to build a text classification model on song lyrics to predict the artist from a piece of text, additionally, to make user-inputs ((artists, lyrics) possible in CLI.

Through web scraping with BeautifulSoup, the song-lyrics of selected artists are extracted from lyrics.com. I built two functions on how to handle the scraping data (extract the song lyrics directly from htmls OR download and save the song lyrics locally as .txt files from every single song lyrics url). In any case, all lyrics will be loaded from the .txt files to create corpus.

In the model pipeline, Tfidfvectorizer (TF-IDF) transforms the words of the corpus into a matrix, count-vectorizes and normalizes them at once by default. For classification, the multinomial Naive Bayes classifier MultinomialNB() was used which is suitable for classification with discrete features like word counts for text classification.

To Do:

Text pre-processing, word-tokenizer and word-lemmatizer of Natural Language Toolkit (NLTK) in order to "clean" the extracted texts
Debug CLI

Time Series Analysis: Temperature Forecast

For this project, I applied the ARIMA model for a short-term temperature forecast. After visualizing the trend, the seasonality and the remainder of the time series data, I inspected the lags of the Autocorrelation (ACF) and Partial Auto Correlation Functions (PACF) plots to determine the parameters of the ARIMA odel (p, d, q) and run tests such as ADF and KPSS for checking stationarity (time dependence).

Data source: European Climate Assessment Dataset.

Unsupervised learning: Recommender Systems

This project refers to a movie recommender built with a web interface. The movie recommender is based on the NMF approach, and creates predictions for movies from their ratings average to recommend movies that would most likely be appreciated by that new similar user. Trained on 'small' dataset of MovieLens.

To Do:

Finish and clean the code for the flask app
Use Streamlit to re-create the app

**All projects were developed under the scope of Data Science Bootcamp of Spiced Academy.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Aritficial_Neural_Networks		Aritficial_Neural_Networks
MCMC_simulation		MCMC_simulation
ML_classification_titanic_data		ML_classification_titanic_data
ML_regression_bike_sharing_demand_project		ML_regression_bike_sharing_demand_project
ML_text_classification_lyrics_project		ML_text_classification_lyrics_project
Time_series_analysis		Time_series_analysis
Unsupervised_learning		Unsupervised_learning
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science portfolio

Artificial Neural Networks

To Do:

Markov Chain-Monte Carlo (MCMC) Simulation

Data Wrangling

Data Analysis and Expolration

Calculating Transition Probabilities between the aisles (5x5 crosstab)

Creating a Customer Class

Running MCMC (Markov-Chain Monte-Carlo) simulation for a single class customer

Extending the simulation to multiple customers

To Do:

Supervised Machine Learning: Classification - Kaggle's Titanic Challenge

Supervised Machine Learning: Regression - Bicycle Rental Forecast

Natural Language Processing (NLP): Text Classification

To Do:

Time Series Analysis: Temperature Forecast

Unsupervised learning: Recommender Systems

To Do:

About

Uh oh!

Releases

Packages

Languages

License

elenamedea/data-science-portfolio

Folders and files

Latest commit

History

Repository files navigation

Data Science portfolio

Artificial Neural Networks

To Do:

Markov Chain-Monte Carlo (MCMC) Simulation

Data Wrangling

Data Analysis and Expolration

Calculating Transition Probabilities between the aisles (5x5 crosstab)

Creating a Customer Class

Running MCMC (Markov-Chain Monte-Carlo) simulation for a single class customer

Extending the simulation to multiple customers

To Do:

Supervised Machine Learning: Classification - Kaggle's Titanic Challenge

Supervised Machine Learning: Regression - Bicycle Rental Forecast

Natural Language Processing (NLP): Text Classification

To Do:

Time Series Analysis: Temperature Forecast

Unsupervised learning: Recommender Systems

To Do:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages