-
Notifications
You must be signed in to change notification settings - Fork 43
Description
The creation of the adf_dataset script is a great way to centralize how we gather data throughout the ADF, however, it seems to be incomplete with respect to the 3 types of files the ADF can process; time series, climo, and regridded climo files.
This script has a nice template that each of these file types can be called, but is not completed. For each of the three file types I think it would be great to finish this work by extending the load functions to be broken up like this:
get_<file_type>_file: gather/check files for test case(s)
load_<file_type>_dataset: return a data set for test case(s)
load_<file_type>_da: return a data array for variable for test case(s)
get_ref_<file_type>_file: gather/check files reference/baseline case
load_reference_<file_type>_dataset: return a data set for reference/baseline case
load_reference_<file_type>_da: return a data array for variable for reference/baseline case
This would allow the ADF to be consistent where ever data sets/arrays need to be called as well as centralize the data cleaning needed for variables, ie applying scale factors/offsets, new units, etc. For example, currently the AMWG tables are being loaded generally via xarray from lib/adf_utils.py load_dataset and are not getting the scale factors/ new units; see Issue #423
Question: would we ever want similar infrastructure to be able to work with raw history files too?