-
Notifications
You must be signed in to change notification settings - Fork 8
Parallel I/O integration #641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… for mpi_grp_t constructor
d5e602c to
3327948
Compare
The MPI master driver is passed a group containing a single process as the I/O in the master driver is done in serial.
Global structure variables describe the grid of the global simulation domain, irrespective of the current MPI rank. These are mland_global, land_x_global, land_y_global, mp_global and landpt_global. This change updates static (time invariant) arrays in the met file to be stored on the global land grid (size mland_global). These variables are: - cable_iovars.F90:latitude(mland_global) - cable_iovars.F90:longitude(mland_global) - cable_iovars.F90:vegtype_metfile(mland_global, nmetpatches) - cable_iovars.F90:vegpatch_metfile(mland_global, nmetpatches) - cable_iovars.F90:soiltype_metfile(mland_global, nmetpatches) - cable_iovars.F90:PrecipScale(mland_global) - cable_input.F90:elevation(mland_global) - cable_input.F90:avPrecip(mland_global)
Local structure variables describe the grid information local to the current MPI rank. These are mp, mland, land_x, land_y, land_decomp_start, land_decomp_end, patch_decomp_start, patch_decomp_end and landpt.
3327948 to
a18d437
Compare
|
Hi @Whyborn, don't want to scare you with a huge PR 😅, but could I get your thoughts on the commits after Temporary commit (to remove): add unit tests. This includes some functionality for handling I/O decompositions which did not exist in Mock CABLE |
Whyborn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the CABLE part of the diffs. If I'm understanding right, the code doesn't yet to PIO output right? Just getting the data structures in place to handle it in future? Personally I'd like the decompositions to be handled by the respective science modules, so that they're more self-contained. The module can expose the likely dimensions of the output e.g. nlat, nlon, ntiles etc via get_nlon(), get_nlat, ..., but I think having them exposed by the decomp module makes it restrictive
| SUBROUTINE cable_abort(message, file, line) | ||
| !! Print the error message and stop the code | ||
| CHARACTER(LEN=*), INTENT(IN) :: message !! Error message | ||
| CHARACTER(LEN=*), INTENT(IN), OPTIONAL :: file | ||
| INTEGER, INTENT(IN), OPTIONAL :: line | ||
| CHARACTER(5) :: line_string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be in favour of making file and line compulsory- I know that gfortran at least defines __LINE__ and __FILE__ variables that people can use, not sure whether that's part of the standard though.
edit: Saw you did use that in some of the abort routines elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I don't see why file and line shouldn't be compulsory - OPTIONAL only allowed that I could find and replace abort with cable_abort without having to specify the file and line arguments. It would be good to do this properly in a separate PR which addresses error handling (related #486).
src/offline/cable_io_decomp.F90
Outdated
| requires_x_y_output_grid = ( & | ||
| ( & | ||
| output%grid == 'default' .AND. metGrid == 'mask' & | ||
| ) .OR. ( & | ||
| output%grid == 'mask' .OR. output%grid == 'ALMA' & | ||
| ) & | ||
| ) | ||
| if (requires_x_y_output_grid) then | ||
| io_decomp%output_base_real32 => io_decomp%land_to_x_y_real32 | ||
| io_decomp%output_base_soil_real32 => io_decomp%land_soil_to_x_y_soil_real32 | ||
| io_decomp%output_base_snow_real32 => io_decomp%land_snow_to_x_y_snow_real32 | ||
| io_decomp%output_base_rad_real32 => io_decomp%land_rad_to_x_y_rad_real32 | ||
| io_decomp%output_base_plantcarbon_real32 => io_decomp%land_plantcarbon_to_x_y_plantcarbon_real32 | ||
| io_decomp%output_base_soilcarbon_real32 => io_decomp%land_soilcarbon_to_x_y_soilcarbon_real32 | ||
| io_decomp%output_base_patch_real32 => io_decomp%patch_to_x_y_patch_real32 | ||
| io_decomp%output_base_patch_soil_real32 => io_decomp%patch_soil_to_x_y_patch_soil_real32 | ||
| io_decomp%output_base_patch_snow_real32 => io_decomp%patch_snow_to_x_y_patch_snow_real32 | ||
| io_decomp%output_base_patch_rad_real32 => io_decomp%patch_rad_to_x_y_patch_rad_real32 | ||
| io_decomp%output_base_patch_plantcarbon_real32 => io_decomp%patch_plantcarbon_to_x_y_patch_plantcarbon_real32 | ||
| io_decomp%output_base_patch_soilcarbon_real32 => io_decomp%patch_soilcarbon_to_x_y_patch_soilcarbon_real32 | ||
| end if | ||
|
|
||
| requires_land_output_grid = ( & | ||
| output%grid == 'land' .OR. (output%grid == 'default' .AND. metGrid == 'land') & | ||
| ) | ||
| if (requires_land_output_grid) then | ||
| io_decomp%output_base_real32 => io_decomp%land_to_land_real32 | ||
| io_decomp%output_base_soil_real32 => io_decomp%land_soil_to_land_soil_real32 | ||
| io_decomp%output_base_snow_real32 => io_decomp%land_snow_to_land_snow_real32 | ||
| io_decomp%output_base_rad_real32 => io_decomp%land_rad_to_land_rad_real32 | ||
| io_decomp%output_base_plantcarbon_real32 => io_decomp%land_plantcarbon_to_land_plantcarbon_real32 | ||
| io_decomp%output_base_soilcarbon_real32 => io_decomp%land_soilcarbon_to_land_soilcarbon_real32 | ||
| io_decomp%output_base_patch_real32 => io_decomp%patch_to_land_patch_real32 | ||
| io_decomp%output_base_patch_soil_real32 => io_decomp%patch_soil_to_land_patch_soil_real32 | ||
| io_decomp%output_base_patch_snow_real32 => io_decomp%patch_snow_to_land_patch_snow_real32 | ||
| io_decomp%output_base_patch_rad_real32 => io_decomp%patch_rad_to_land_patch_rad_real32 | ||
| io_decomp%output_base_patch_plantcarbon_real32 => io_decomp%patch_plantcarbon_to_land_patch_plantcarbon_real32 | ||
| io_decomp%output_base_patch_soilcarbon_real32 => io_decomp%patch_soilcarbon_to_land_patch_soilcarbon_real32 | ||
| end if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to support output of either gridded or vectorised outputs? Why isn't an IF - ELSEIF - ELSE? Where do site runs fall here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to support output of either gridded or vectorised outputs?
Yes, base here can either refer to land or x_y depending on the output settings.
I agree, we should probably have else if else if the settings were misconfigured - I'll address this.
Also relevant:
For site runs, I think metGrid is initialised to 'mask' here
| integer function subscript(shape_spec, name) | ||
| type(dim_spec_t), intent(in) :: shape_spec(:) | ||
| character(*), intent(in) :: name | ||
| integer i | ||
| do i = 1, size(shape_spec) | ||
| if (shape_spec(i)%name == name) then | ||
| subscript = i | ||
| return | ||
| end if | ||
| end do | ||
| call cable_abort("Name '" // name // "' not found in shape_spec.", file=__FILE__, line=__LINE__) | ||
| subscript = -1 | ||
| end function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the control here can be a bit confusing, with it usually "returning" mid way through the function. Could you initialise subscript=-1, and only exit the loop when the name is found, then check the value afterwards? e.g.
function subscript(...)
subscript=-1
do i = 1, size(shape_spec)
if (shape_spec(i)%name == name) then
subscript=i
exit
end if
end do
if (subscript == -1) then
call cable_abort(...)
end if
end functionMight just be me though, it certainly works perfectly fine as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you here, I'll update
|
|
||
| logical :: requires_land_output_grid, requires_x_y_output_grid | ||
|
|
||
| mem_shape_land = [dim_spec_t('land', mland)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a dumb question, but where is the concrete dim_spec_t used as source to define these dim specs defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dim_spec_t is a derived type which associates a name with a dimension.
The %name component of dim_spec_t is used to distinguish "special" dimensions which may be substituted when mapping an array in memory to and from a NetCDF variable. For example, the patch dimension which ranges from 1 to mp could be mapped to a (land, patch) coordinate which ranges from (1, 1) to (mland_global, max_veg_patches). This is done in src/util/netcdf/cable_netcdf_decomp_util.F90.
As for the actual shapes, mem_shape_* is chosen to be consistent with internal CABLE data structures, and var_shape_* is chosen to be consistent with the NetCDF variable shapes for CABLE output and input variables.
I will make sure to document this of course 😄 but does that answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My personal preference would be to have the decompositions defined somewhere closer to the use site e.g. in the output module(s). Also, how much does this change if we remove the capability to write out in different formats, and force an x-y grid format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no issues with declaring the decomposition objects in separate modules where they are used.
I suspect however that there will be decompositions which would be shared across applications when the in memory array and the NetCDF variable are the same shape. So in addition to separating the modules, I think there could be value in having a common decomposition object, where each application declares pointers which are associated with the common decomposition objects.
how much does this change if we remove the capability to write out in different formats, and force an x-y grid format
If you enforce an x-y grid for all I/O then you should only be left with decompositions which involve x_y in the name.
| IF (mpi_grp%size == 1 .OR. .NOT. cable_user%mpi_legacy) THEN | ||
| CALL serialdrv(NRRRR, dels, koffset, kend, GSWP_MID, PLUME, CRU, site, mpi_grp) | ||
| ELSE | ||
| IF (mpi_grp%rank == 0) THEN | ||
| CALL mpidrv_master(mpi_grp%comm%mpi_val, dels, koffset, kend, PLUME, CRU) | ||
| CALL mpi_grp%split(COLOR_MASTER, mpi_grp%rank, mpi_grp_master) | ||
| CALL mpidrv_master(mpi_grp%comm%mpi_val, dels, koffset, kend, PLUME, CRU, mpi_grp_master) | ||
| ELSE | ||
| CALL mpi_grp%split(COLOR_WORKER, mpi_grp%rank, mpi_grp_worker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this splitting of MPI master and worker a temporary fix while the rest of the MPI implementation is worked out? The plan is still to collapse the drivers into one right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we definitely are aiming to have only single driver (this would be the serial driver). The plan is to deprecate the legacy MPI implementation until we are confident the new implementation works correctly
So not really collapsing the drivers per se, but getting to a point where we can safely remove mpidrv_master and mpidrv_worker
| ! count to obtain 'landpt_global', 'max_vegpatches' and 'mp_global' | ||
| CALL countPatch(nlon, nlat, npatch) | ||
|
|
||
| CALL init_local_structure_variables(mpi_grp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this called here? It appears that it only depends on the mpi_grp (I know it's not actually the case, and it uses global variables, but if we can justify that we can put it anywhere). Seems a bit clunky to have to pass mpi_grp down the call stack just to pass to this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it's a pretty weird place to call init_local_structure_variables, it's definitely a sign we need to pull some things out of load_parameters. This was put here because init_local_structure_variables requires to be called after countPatch is called, as this initialises the rest of the global structure variables, and the local structure variables need to be initialised before allocate_cable_vars
Experimental branch for integrating parallel I/O (part of #358). This should not be merged into main as is.
📚 Documentation preview 📚: https://cable--641.org.readthedocs.build/en/641/