Is your feature request related to a problem? Please describe.
Right now users need to specify a unique data source; there are possible implicit conflicts between local files and huggingface datasets, for example, which can be confusing.
Describe alternatives you've considered
–data {“type”: “synthetic”, ...} to force user to clearly specify expectations.