This repository complements the paper Missing Data Imputation using Optimal Transport (Muzellec B., Josse J., Boyer C., Cuturi, M.):
experiment.pyallows to reproduce the imputation benchmark therein;imputers.pycontains the classes corresponding to algorithms 1 and 3;data_loaders.pycontains data loading utilities for the UCI ML repository datasets on which experiments are run;utils.pycontains methods of general utility, and the implementation of MAR and MNAR missing data mechanisms in particular;softimpute.pycontains the implementation of the softimpute baseline.
An example notebook is also available: UCI_demo.ipynb.
Muzellec B., Josse J., Boyer C., Cuturi, M.: Missing Data Imputation using Optimal Transport
@inproceedings{muzellec2020missing,
title={Missing Data Imputation using Optimal Transport},
author={Muzellec, Boris and Josse, Julie and Boyer, Claire and Cuturi, Marco},
booktitle={International Conference on Machine Learning},
pages={7130--7140},
year={2020},
organization={PMLR}
}
To use the data loading utilities in data_loaders.py, wget is also required.