| Master | Develop |
|---|---|
This repository contains a system for evaluating metalearning systems on a meta-dataset (a dataset containing info about machine learning experiments on datasets).
The meta-dataset was generated using the infrastructure created by Data Driven Discovery of Models (D3M), see https://gitlab.com/datadrivendiscovery.
Models range from the simple (random, linear regression) to the complex (deep neural networks, AutoSKLearn) and are evaluated on various metrics (see dna/metrics.py).
- Setup the python enviroment (
pip3 install -r requirements.txt) main.shcontains examples of how to run each metalearning model. A more complete description of how to run a model can be found by runningpython3 -m dna --helpor by inspectingdna/__main__.py.
- Models are configured using JSON files, mapping function names to arguments like batch size, learning rate, and number of training epochs.
Examples can be found in
model_configs/<model name>_config.json. - There are two meta-datasets available,
data/complete_classification.tar.xzanddata/small_classification.tar.xz. The smaller is a subset of the larger for development purposes. The first few lines ofmain.shshow how to use either dataset. - Complete results of running a metalearning model on the meta-dataset are written to the directory specified by the
--output-dirflag and contain the arguments used to reproduce the results, model predictions, scores, plots, and any other model outputs such as model parameters.
- Add your new model code to
dna/models/_your_model_name_.py. It should inherit from the base classes of the tasks it can perform (RegressionBase, RankingBase, SubsetBase). - Please add tests to
tests/test_models.py. - Once the model inherits from those classes and overrides their methods, the model should be imported and added to the list found in the function
get_modelsof the filedna/models.py. - You can then run your model from the command line, or by adding it to
main.sh
- Add your metric code to
dna/metrics.py. - Please add tests to the file
tests/test_metrics.py. - The metrics are computed in
dna/problems.py, in the appropriate problem class'sscoremethod, e.g. the Spearman Correlation Coefficient is computed with theRankProblem. - You can see your metric in action by running a model from the command line, or by running
main.sh.
model_configs_tuned contains the model config files chosen by hyperparameter tuning
./random_results_test/*/run.json.tar.gz contain the results of the random model on the test set
./results_validation_set_rescore/*/run.json.tar.gz contain the results of all non-random models on the validation set. These were used to choose the model configurations to use for the test set. Note that these results were rescored after added more scoringi metrics.
./results_test_set/*/run.json.tar.gz contain the results of all non-random models on the test set