Skip to content

Conversation

@SeanBryan51
Copy link
Collaborator

@SeanBryan51 SeanBryan51 commented Oct 27, 2025

Experimental branch for integrating parallel I/O (part of #358). This should not be merged into main as is.


📚 Documentation preview 📚: https://cable--641.org.readthedocs.build/en/641/

@SeanBryan51 SeanBryan51 force-pushed the parallelio-playground branch 2 times, most recently from d5e602c to 3327948 Compare October 27, 2025 03:21
The MPI master driver is passed a group containing a single process as
the I/O in the master driver is done in serial.
Global structure variables describe the grid of the global simulation
domain, irrespective of the current MPI rank. These are mland_global,
land_x_global, land_y_global, mp_global and landpt_global.

This change updates static (time invariant) arrays in the met file to
be stored on the global land grid (size mland_global). These variables
are:
- cable_iovars.F90:latitude(mland_global)
- cable_iovars.F90:longitude(mland_global)
- cable_iovars.F90:vegtype_metfile(mland_global, nmetpatches)
- cable_iovars.F90:vegpatch_metfile(mland_global, nmetpatches)
- cable_iovars.F90:soiltype_metfile(mland_global, nmetpatches)
- cable_iovars.F90:PrecipScale(mland_global)
- cable_input.F90:elevation(mland_global)
- cable_input.F90:avPrecip(mland_global)
Local structure variables describe the grid information local to the
current MPI rank. These are mp, mland, land_x, land_y,
land_decomp_start, land_decomp_end, patch_decomp_start, patch_decomp_end
and landpt.
@SeanBryan51 SeanBryan51 force-pushed the parallelio-playground branch from 3327948 to a18d437 Compare October 27, 2025 03:30
@SeanBryan51 SeanBryan51 changed the title Investigate parallel I/O integration Parallel I/O integration Oct 27, 2025
@SeanBryan51
Copy link
Collaborator Author

Hi @Whyborn, don't want to scare you with a huge PR 😅, but could I get your thoughts on the commits after Temporary commit (to remove): add unit tests. This includes some functionality for handling I/O decompositions which did not exist in Mock CABLE

Copy link
Contributor

@Whyborn Whyborn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the CABLE part of the diffs. If I'm understanding right, the code doesn't yet to PIO output right? Just getting the data structures in place to handle it in future? Personally I'd like the decompositions to be handled by the respective science modules, so that they're more self-contained. The module can expose the likely dimensions of the output e.g. nlat, nlon, ntiles etc via get_nlon(), get_nlat, ..., but I think having them exposed by the decomp module makes it restrictive

Comment on lines +39 to +44
SUBROUTINE cable_abort(message, file, line)
!! Print the error message and stop the code
CHARACTER(LEN=*), INTENT(IN) :: message !! Error message
CHARACTER(LEN=*), INTENT(IN), OPTIONAL :: file
INTEGER, INTENT(IN), OPTIONAL :: line
CHARACTER(5) :: line_string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favour of making file and line compulsory- I know that gfortran at least defines __LINE__ and __FILE__ variables that people can use, not sure whether that's part of the standard though.

edit: Saw you did use that in some of the abort routines elsewhere

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I don't see why file and line shouldn't be compulsory - OPTIONAL only allowed that I could find and replace abort with cable_abort without having to specify the file and line arguments. It would be good to do this properly in a separate PR which addresses error handling (related #486).

Comment on lines 215 to 253
requires_x_y_output_grid = ( &
( &
output%grid == 'default' .AND. metGrid == 'mask' &
) .OR. ( &
output%grid == 'mask' .OR. output%grid == 'ALMA' &
) &
)
if (requires_x_y_output_grid) then
io_decomp%output_base_real32 => io_decomp%land_to_x_y_real32
io_decomp%output_base_soil_real32 => io_decomp%land_soil_to_x_y_soil_real32
io_decomp%output_base_snow_real32 => io_decomp%land_snow_to_x_y_snow_real32
io_decomp%output_base_rad_real32 => io_decomp%land_rad_to_x_y_rad_real32
io_decomp%output_base_plantcarbon_real32 => io_decomp%land_plantcarbon_to_x_y_plantcarbon_real32
io_decomp%output_base_soilcarbon_real32 => io_decomp%land_soilcarbon_to_x_y_soilcarbon_real32
io_decomp%output_base_patch_real32 => io_decomp%patch_to_x_y_patch_real32
io_decomp%output_base_patch_soil_real32 => io_decomp%patch_soil_to_x_y_patch_soil_real32
io_decomp%output_base_patch_snow_real32 => io_decomp%patch_snow_to_x_y_patch_snow_real32
io_decomp%output_base_patch_rad_real32 => io_decomp%patch_rad_to_x_y_patch_rad_real32
io_decomp%output_base_patch_plantcarbon_real32 => io_decomp%patch_plantcarbon_to_x_y_patch_plantcarbon_real32
io_decomp%output_base_patch_soilcarbon_real32 => io_decomp%patch_soilcarbon_to_x_y_patch_soilcarbon_real32
end if

requires_land_output_grid = ( &
output%grid == 'land' .OR. (output%grid == 'default' .AND. metGrid == 'land') &
)
if (requires_land_output_grid) then
io_decomp%output_base_real32 => io_decomp%land_to_land_real32
io_decomp%output_base_soil_real32 => io_decomp%land_soil_to_land_soil_real32
io_decomp%output_base_snow_real32 => io_decomp%land_snow_to_land_snow_real32
io_decomp%output_base_rad_real32 => io_decomp%land_rad_to_land_rad_real32
io_decomp%output_base_plantcarbon_real32 => io_decomp%land_plantcarbon_to_land_plantcarbon_real32
io_decomp%output_base_soilcarbon_real32 => io_decomp%land_soilcarbon_to_land_soilcarbon_real32
io_decomp%output_base_patch_real32 => io_decomp%patch_to_land_patch_real32
io_decomp%output_base_patch_soil_real32 => io_decomp%patch_soil_to_land_patch_soil_real32
io_decomp%output_base_patch_snow_real32 => io_decomp%patch_snow_to_land_patch_snow_real32
io_decomp%output_base_patch_rad_real32 => io_decomp%patch_rad_to_land_patch_rad_real32
io_decomp%output_base_patch_plantcarbon_real32 => io_decomp%patch_plantcarbon_to_land_patch_plantcarbon_real32
io_decomp%output_base_patch_soilcarbon_real32 => io_decomp%patch_soilcarbon_to_land_patch_soilcarbon_real32
end if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to support output of either gridded or vectorised outputs? Why isn't an IF - ELSEIF - ELSE? Where do site runs fall here?

Copy link
Collaborator Author

@SeanBryan51 SeanBryan51 Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to support output of either gridded or vectorised outputs?

Yes, base here can either refer to land or x_y depending on the output settings.

I agree, we should probably have else if else if the settings were misconfigured - I'll address this.

Also relevant:

For site runs, I think metGrid is initialised to 'mask' here

Comment on lines +24 to +36
integer function subscript(shape_spec, name)
type(dim_spec_t), intent(in) :: shape_spec(:)
character(*), intent(in) :: name
integer i
do i = 1, size(shape_spec)
if (shape_spec(i)%name == name) then
subscript = i
return
end if
end do
call cable_abort("Name '" // name // "' not found in shape_spec.", file=__FILE__, line=__LINE__)
subscript = -1
end function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the control here can be a bit confusing, with it usually "returning" mid way through the function. Could you initialise subscript=-1, and only exit the loop when the name is found, then check the value afterwards? e.g.

function subscript(...)

  subscript=-1
  do i = 1, size(shape_spec)
    if (shape_spec(i)%name == name) then
      subscript=i
      exit
    end if
  end do
  
  if (subscript == -1) then
    call cable_abort(...)
  end if
end function

Might just be me though, it certainly works perfectly fine as is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you here, I'll update


logical :: requires_land_output_grid, requires_x_y_output_grid

mem_shape_land = [dim_spec_t('land', mland)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a dumb question, but where is the concrete dim_spec_t used as source to define these dim specs defined?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dim_spec_t is a derived type which associates a name with a dimension.

The %name component of dim_spec_t is used to distinguish "special" dimensions which may be substituted when mapping an array in memory to and from a NetCDF variable. For example, the patch dimension which ranges from 1 to mp could be mapped to a (land, patch) coordinate which ranges from (1, 1) to (mland_global, max_veg_patches). This is done in src/util/netcdf/cable_netcdf_decomp_util.F90.

As for the actual shapes, mem_shape_* is chosen to be consistent with internal CABLE data structures, and var_shape_* is chosen to be consistent with the NetCDF variable shapes for CABLE output and input variables.

I will make sure to document this of course 😄 but does that answer your question?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference would be to have the decompositions defined somewhere closer to the use site e.g. in the output module(s). Also, how much does this change if we remove the capability to write out in different formats, and force an x-y grid format?

Copy link
Collaborator Author

@SeanBryan51 SeanBryan51 Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no issues with declaring the decomposition objects in separate modules where they are used.

I suspect however that there will be decompositions which would be shared across applications when the in memory array and the NetCDF variable are the same shape. So in addition to separating the modules, I think there could be value in having a common decomposition object, where each application declares pointers which are associated with the common decomposition objects.

how much does this change if we remove the capability to write out in different formats, and force an x-y grid format

If you enforce an x-y grid for all I/O then you should only be left with decompositions which involve x_y in the name.

Comment on lines +63 to +70
IF (mpi_grp%size == 1 .OR. .NOT. cable_user%mpi_legacy) THEN
CALL serialdrv(NRRRR, dels, koffset, kend, GSWP_MID, PLUME, CRU, site, mpi_grp)
ELSE
IF (mpi_grp%rank == 0) THEN
CALL mpidrv_master(mpi_grp%comm%mpi_val, dels, koffset, kend, PLUME, CRU)
CALL mpi_grp%split(COLOR_MASTER, mpi_grp%rank, mpi_grp_master)
CALL mpidrv_master(mpi_grp%comm%mpi_val, dels, koffset, kend, PLUME, CRU, mpi_grp_master)
ELSE
CALL mpi_grp%split(COLOR_WORKER, mpi_grp%rank, mpi_grp_worker)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this splitting of MPI master and worker a temporary fix while the rest of the MPI implementation is worked out? The plan is still to collapse the drivers into one right?

Copy link
Collaborator Author

@SeanBryan51 SeanBryan51 Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we definitely are aiming to have only single driver (this would be the serial driver). The plan is to deprecate the legacy MPI implementation until we are confident the new implementation works correctly

So not really collapsing the drivers per se, but getting to a point where we can safely remove mpidrv_master and mpidrv_worker

! count to obtain 'landpt_global', 'max_vegpatches' and 'mp_global'
CALL countPatch(nlon, nlat, npatch)

CALL init_local_structure_variables(mpi_grp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called here? It appears that it only depends on the mpi_grp (I know it's not actually the case, and it uses global variables, but if we can justify that we can put it anywhere). Seems a bit clunky to have to pass mpi_grp down the call stack just to pass to this function.

Copy link
Collaborator Author

@SeanBryan51 SeanBryan51 Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's a pretty weird place to call init_local_structure_variables, it's definitely a sign we need to pull some things out of load_parameters. This was put here because init_local_structure_variables requires to be called after countPatch is called, as this initialises the rest of the global structure variables, and the local structure variables need to be initialised before allocate_cable_vars

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants