Skip to content

mbozada/data-science

Repository files navigation

data-science

A repository for my independent study.

Topics

Principal Component Analysis (PCA)

Wikipedia
StatQuest
Towards Data Science - A One-Stop Shop for Principal Component Analysis by Matt Brems

Multidimensional Scaling (MDS)

Wikipedia
StatQuest
StackAbuse - Guide to Multidimsensional Scaling in Python with Scikit-Learn by Mehreen Saeed

t-Distributed Stochastic Neighbor Embedding (t-SNE)

Wikipedia
StatQuest
Visualizing Data using t-SNE by Maaten & Hinton

k-Means Clustering

Wikipedia
StatQuest

Hierarchical Clustering

Wikipedia

Agglomerative

StatQuest

Divisive

Need to find a couple resources.

Density-based Spatial Clustering of Applications with Noise (DBSCAN)

Wikipedia
StatQuest

Linear Discriminant Analysis (LDA)

Wikipedia
StatQuest
scikit-learn

Cluster Quality Metrics

SA vs EM vs DBI... by Georgios Drakos

Davies-Bouldin Index (DBI)

Wikipedia

Silhouette Score

Wikipedia
Towards Data Science - Silhouette Coefficient by Ashutosh Bhardwaj

Fitting

StatQuest - Bias and Variance
IBM - Overfitting
IBM - Underfitting
StatQuest - Cross Validation

Cross-Validation

StatQuest - Cross Validation
scikit-learn

Pipelines

scikit-learn
Medium - A Simple Guide to Pipelines by Rebecca Vickery

ROC and AUC

StatQuest - Confusion Matrix
StatQuest - Sensitivity and Specificity
StatQuest - ROC and AUC
scikit-learn - ROC

Multiclass

Towards Data Science - Multiclass classification evaluation with ROC Curves and ROC AUC by Vinicius Trevisan

F1-Score

Wikipedia - F-score
Towards Data Science - F1-Score, Macro, Weighted, and Micro by Kenneth Leung

Naive Bayes

Bayes Theorem

3Blue1Brown - Bayes Theorem
StatQuest - Bayes Theorem
Wikipedia - Bayes Classifier

Naive Bayes Classifiers

Towards Data Science - Naive Bayes Classifier by Rohith Gandhi
scikit-learn - Naive Bayes
StatQuest - Multinomial Naive Bayes
StatQuest - Gaussian Naive Bayes

Bayesian Network

Wikipedia - Bayesian Network
Towards Data Science - Intro to Bayesian Networks by Devin Soni

Random Forest

Information Theory

Wikipedia - Information Theory
Khan Academy - What is information theory?
Wikipedia - Mutual Information
Wikipedia - Information Entropy
Khan Academy - Information Entropy
StatQuest - Expected Values
StatQuest - Entropy (for data science)

Decision Trees

Wikipedia - Decision tree
StatQuest - Decision and Classification Trees
StatQuest - Regression Trees
StatQuest - Pruning Regression Trees
Towards Data Science - Gini Index vs Information Entropy by Andrew Hershy

Random Forest

StatQuest - Random Forests Part 1
StatQuest - Random Forests Part 2
Towards Data Science - Ensemble methods: bagging, boosting and stacking by Joseph Rocca
scikit-learn - Ensemble Methods

Neural Networks

3Blue1Brown - Neural Networks Playlist
StatQuest - Neural Networks / Deep Learning Playlist
Tensorflow Playground
Towards Data Science - When Not to Use Neural Networks by Ygor Serpa

Datasets

UC Irvine Machine Learning Repository
Kaggle

n = 581012
d = 54

n = 100000
d = 55

n = 515345
d = 90

About

A repository for my independent study.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published