Wanted to propose an evaluation of the fidelity of the full joint using optimal transport, as seen in the Generative Forests preprint (2024)
here’s an example implementation in python using the regularized sinkhorn algorithm they use
A few notes:
- this would address the limitations of evaluating low-dimensional marginals
- users would need to generate the same number of synthetic samples as there are test cases
- we could begin by using the discretized data, and eventually experiment with continuous data that is normalized so that the distances are comparable
- this would allow investigating how well we fit the data (compared to baselines) as we change the number of dimensions of the marginal