Skip to content

Clean up data set and data array calls from adf_dataset.py #424

@justin-richling

Description

@justin-richling

The creation of the adf_dataset script is a great way to centralize how we gather data throughout the ADF, however, it seems to be incomplete with respect to the 3 types of files the ADF can process; time series, climo, and regridded climo files.

This script has a nice template that each of these file types can be called, but is not completed. For each of the three file types I think it would be great to finish this work by extending the load functions to be broken up like this:

get_<file_type>_file: gather/check files for test case(s)
load_<file_type>_dataset: return a data set for test case(s)
load_<file_type>_da: return a data array for variable for test case(s)

get_ref_<file_type>_file: gather/check files reference/baseline case
load_reference_<file_type>_dataset: return a data set for reference/baseline case
load_reference_<file_type>_da: return a data array for variable for reference/baseline case

This would allow the ADF to be consistent where ever data sets/arrays need to be called as well as centralize the data cleaning needed for variables, ie applying scale factors/offsets, new units, etc. For example, currently the AMWG tables are being loaded generally via xarray from lib/adf_utils.py load_dataset and are not getting the scale factors/ new units; see Issue #423

Question: would we ever want similar infrastructure to be able to work with raw history files too?

Metadata

Metadata

Labels

bugSomething isn't workingbug-fixFixes a particular bug (or set of bugs)code clean-upMade code simpler and/or easier to read.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions