This is one of the assignments that my group and I did for the Machine Learning course in the University of Groningen.
This assignment required us to choose one of the datasets from sklearn.datasets. We had to select a machine learning model of our choice, but appropriate for the task (classification or regression). Train a model first on 10% of the training dataset, then on 30%, then on 50%, and finally on the entire training dataset. Of course we had to make sure that our models obtained the best possible performance (by tuning hyper-parameters, for instance). We chose a breast cancer data set, so we were working with a classification task. We did a bit more than the assignment asked for. Instead of choosing one algorithm (for instance, Decision Tree Classifier) and varying the size of the test sets, we chose three types of classifiers (SVC, Decision Tree, and Random Forest Classifier), built grids for hyperparameters tuning, and found the best hyperparameters for each algorithm for each training size.