Skip to content

Solver improvements#177

Open
thomasmelvin wants to merge 26 commits intoMetOffice:mainfrom
thomasmelvin:solver_improvements
Open

Solver improvements#177
thomasmelvin wants to merge 26 commits intoMetOffice:mainfrom
thomasmelvin:solver_improvements

Conversation

@thomasmelvin
Copy link

@thomasmelvin thomasmelvin commented Jan 22, 2026

PR Summary

Sci/Tech Reviewer: @tommbendall
Code Reviewer: @mo-rickywong

Add in a number of solver optimisations.
The principle performance improvement in this pull request is to split the application of the mixed operator into two seperate new kernels.

  1. science/gungho/source/kernel/solver/apply_mixed_u_operator_kernel_mod.F90 computes the lhs for the horizontal wind components. This is done in the broken W2h space so that a write access can be used, avoiding any colouring or halo swaps. After this call the broken W2h lhs needs to be assembled in the continuous W2h space (reapplying a single halo swap)
  2. science/gungho/source/kernel/solver/apply_mixed_wp_operator_kernel_mod.F90 computes the lhs for the vertical wind and pressure components. As these fields lie on horizontally discontinuous spaces there is no colouring or halo exchanges needed.

These changes result in a performance improvement for 2 main reasons

  1. Better use of memory by splitting the single large kernel into two kernels and (presumably) getting better cache usage
  2. Reduction in the number of halo swaps from 3 (for the horizontal wind, vertical wind & pressure) into 1 (for the broken horizontal wind lhs). It appears one of these saved halo exchanges is then reinstated elsewhere in the code, likely when the pressure lhs or vertical wind lhs is needed in the halo region in a seperate kernel.

The C224 & C896 lfric atm tests in the test suite were run with these changes giving the following solver times

Code C224 C896
Trunk 50.71 114.57
Branch 41.59 98.69

Code Quality Checklist

  • I have performed a self-review of my own code
  • My code follows the project's style guidelines
  • Comments have been included that aid understanding and enhance the readability of the code
  • My changes generate no new warnings
  • All automated checks in the CI pipeline have completed successfully

Testing

  • I have tested this change locally, using the LFRic Core rose-stem suite
  • If required (e.g. API changes) I have also run the LFRic Apps test suite using this branch
  • If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes)
  • I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.)
  • Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

trac.log

These results are from before the KGO update. The failure in the lfric_inputs appears not to be due to this pull request as none of the code changes should be used and is likely one of the occasional lfric_inputs failures that we see

Test Suite Results - lfric_apps - solver_improvements/run6

Suite Information

Item Value
Suite Name solver_improvements/run6
Suite User thomas.melvin
Workflow Start 2026-01-22T11:45:40
Groups Run all
Dependency Reference Main Like
casim MetOffice/casim@2025.12.1 True
jules MetOffice/jules@2025.12.1 True
lfric_apps thomasmelvin/lfric_apps@solver_improvements False
lfric_core MetOffice/lfric_core@2025.12.1 True
moci MetOffice/moci@2025.12.1 True
SimSys_Scripts MetOffice/SimSys_Scripts@2025.12.1 True
socrates MetOffice/socrates@2025.12.1 True
socrates-spectral MetOffice/socrates-spectral@2025.12.1 True
ukca MetOffice/ukca@2025.12.1 True

Task Information

❌ failed tasks - 150
Task State
check_gungho_model_agnesi_hyd_cart-BiP120x8-2000x2000_azspice_gnu_fast-debug-64bit failed
check_gungho_model_agnesi_hyd_cart-BiP120x8-2000x2000_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-alt1-C24s_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-alt1-C24s_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-alt2-C24_MG_op_azspice_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-alt2-C24_MG_op_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-alt3-C24_MG_azspice_gnu_fast-debug-64bit-rtran32 failed
check_gungho_model_baroclinic-alt3-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-pert-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_baroclinic-pert-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_bryan_fritsch-dry-BiP200x10-100x100_azspice_gnu_fast-debug-64bit failed
check_gungho_model_bryan_fritsch-dry-BiP200x10-100x100_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_dcmip200-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_dcmip200-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_dcmip200_realorog-C48_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_dcmip200_realorog-C48_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_dcmip301-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_dcmip301-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_deep-hot-jupiter-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_deep-hot-jupiter-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_earth-like-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_earth-like-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_held-suarez-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_held-suarez-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_lfric-real-domain-C48_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_lfric-real-domain-C48_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_robert-moist-lam-BiP100x8-10x10_azspice_gnu_fast-debug-64bit failed
check_gungho_model_robert-moist-lam-BiP100x8-10x10_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_robert-moist-smag-BiP100x8-10x10_azspice_gnu_fast-debug-64bit failed
check_gungho_model_robert-moist-smag-BiP100x8-10x10_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_sbr-C24_MG_azspice_gnu_fast-debug-64bit failed
check_gungho_model_sbr-C24_MG_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_sbr-alt2-C24_MG_op_azspice_gnu_fast-debug-64bit failed
check_gungho_model_sbr-alt2-C24_MG_op_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_sbr-alt3-C24_MG_azspice_gnu_fast-debug-64bit-rtran32 failed
check_gungho_model_sbr-alt3-C24_MG_ex1a_gnu_fast-debug-64bit-rtran32 failed
check_gungho_model_sbr_lam-n96_MG_lam_azspice_gnu_fast-debug-64bit failed
check_gungho_model_sbr_lam-n96_MG_lam_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_sbr_lam-n96_MG_lam_rotate_azspice_gnu_fast-debug-64bit failed
check_gungho_model_sbr_lam-n96_MG_lam_rotate_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_schar_cart-BiP200x8-500x500_azspice_gnu_fast-debug-64bit failed
check_gungho_model_schar_cart-BiP200x8-500x500_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_schar_cart-alt2-BiP100x4-1000x1000_azspice_gnu_fast-debug-64bit failed
check_gungho_model_schar_cart-alt2-BiP100x4-1000x1000_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_semi-implicit-for-linear-C12_azspice_gnu_fast-debug-64bit failed
check_gungho_model_semi-implicit-for-linear-C12_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_shallow-hot-jupiter-C24_MG_azspice_gnu_fast-debug-64bit-crun1 failed
check_gungho_model_shallow-hot-jupiter-C24_MG_ex1a_gnu_fast-debug-64bit-crun1 failed
check_gungho_model_skamarock_klemp_gw_p0-BiP300x8-1000x2000_azspice_gnu_fast-debug-64bit failed
check_gungho_model_skamarock_klemp_gw_p0-BiP300x8-1000x2000_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-BiP256x8-200x200_azspice_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-BiP256x8-200x200_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-alt1-BiP256x4-200x200_azspice_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-alt1-BiP256x4-200x200_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-alt2-BiP256x16-200x50_op_azspice_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-alt2-BiP256x16-200x50_op_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_straka_200m-alt3-BiP256x8-200x200_azspice_gnu_fast-debug-64bit-rtran32 failed
check_gungho_model_straka_200m-alt3-BiP256x8-200x200_ex1a_gnu_fast-debug-64bit failed
check_gungho_model_tidally-locked-earth-C24_MG_azspice_gnu_fast-debug-64bit-crun1 failed
check_gungho_model_tidally-locked-earth-C24_MG_ex1a_gnu_fast-debug-64bit-crun1 failed
check_gungho_model_tidally-locked-earth-C24s_rot_MG_azspice_gnu_fast-debug-64bit-crun1 failed
check_gungho_model_tidally-locked-earth-C24s_rot_MG_ex1a_gnu_fast-debug-64bit-crun1 failed
check_jedi_lfric_tests_forecast_gh-si-for-linear-C12_azspice_gnu_fast-debug-64bit failed
check_jedi_lfric_tests_forecast_gh-si-for-linear-C12_azspice_gnu_full-debug-64bit failed
check_jedi_lfric_tests_forecast_gh-si-for-linear-C12_ex1a_cce_fast-debug-64bit failed
check_jedi_lfric_tests_nwp_gal9-C12_azspice_gnu_fast-debug-64bit failed
check_jedi_lfric_tests_nwp_gal9-C12_ex1a_cce_fast-debug-64bit failed
check_jedi_lfric_tests_tlm_forecast_tl_default-C12_azspice_gnu_fast-debug-64bit failed
check_jedi_lfric_tests_tlm_forecast_tl_default-C12_ex1a_cce_fast-debug-64bit failed
check_jedi_lfric_tests_tlm_forecast_tl_default-C12_op_azspice_gnu_fast-debug-64bit failed
check_jedi_lfric_tests_tlm_forecast_tl_default-C12_op_ex1a_cce_fast-debug-64bit failed
check_lfric_atm_aquaplanet-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_aquaplanet-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_camembert_case3_gj1214b-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_camembert_case3_gj1214b-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_clim_gal9-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_clim_gal9-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_clim_gal9_1T-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_1T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_2T-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_2T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_4T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_chem-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_clim_gal9_chem-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_clim_gal9_chem_1T-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_chem_2T-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_comp_tran_ref_3d_l120-BiP64x64-1500x1500_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_hd209458b-C24_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_hd209458b-C24_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_casim-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_casim-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_coma9-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_coma9-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_comorph_dev-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_comorph_dev-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_comorph_tb-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9-C12_azspice_gnu_fast-debug-64bit-crun1 failed
check_lfric_atm_nwp_gal9-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9-C12_ex1a_cce_fast-debug-64bit-crun1 failed
check_lfric_atm_nwp_gal9-C48_MG_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_nwp_gal9-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9-pert-C12_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_nwp_gal9-pert-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_coarse_aero-C48_MG_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_coarse_aero-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_coarse_aero_threaded-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_coarse_aero_threaded-C48_MG_ex1a_gnu_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_da-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_da-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_debug-C12_azspice_gnu_full-debug-32bit failed
check_lfric_atm_nwp_gal9_debug-C12_ex1a_cce_full-debug-32bit failed
check_lfric_atm_nwp_gal9_debug-C48_MG_azspice_gnu_full-debug-32bit failed
check_lfric_atm_nwp_gal9_debug-C48_MG_ex1a_cce_full-debug-32bit failed
check_lfric_atm_nwp_gal9_eda-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_eda-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_eda_jada-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_eda_jada-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_mol-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_mol-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_noukca_1T-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_noukca_1T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_noukca_2T-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_noukca_2T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_noukca_2T-C48_MG_ex1a_cce_full-debug-32bit failed
check_lfric_atm_nwp_gal9_noukca_4T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_short-C12_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_short-C12_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_ral3-seuk_MG_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_ral3-seuk_MG_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_ral3_ens-seuk_MG_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_ral3_ens-seuk_MG_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_ral3_mixmol-seuk_MG_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_ral3_mixmol-seuk_MG_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_rce-BiP64x64-1500x1500_MG_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_rce-BiP64x64-1500x1500_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_thai_ben1-C48_MG_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_thai_ben1-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_coupled_nwp_gal9-C48_ex1a_cce_fast-debug-64bit failed
check_linear_model_dcmip301-C24_azspice_gnu_fast-debug-64bit failed
check_linear_model_dcmip301-C24_ex1a_gnu_fast-debug-64bit failed
check_linear_model_nwp_gal9-C12_MG_azspice_gnu_fast-debug-64bit failed
check_linear_model_nwp_gal9-C12_MG_ex1a_gnu_fast-debug-64bit failed
check_linear_model_nwp_gal9_random-C12_MG_azspice_gnu_fast-debug-64bit failed
check_linear_model_nwp_gal9_random-C12_MG_ex1a_gnu_fast-debug-64bit failed
check_linear_model_semi-implicit-C12_azspice_gnu_fast-debug-64bit failed
check_linear_model_semi-implicit-C12_ex1a_gnu_fast-debug-64bit failed
rose_ana_lfricinputs_um2lfric-protogal_chem-N48L70_C12L70_azspice_gnu_full-debug-64bit failed
✅ succeeded tasks - 1305
⌛ waiting tasks - 2
Task State
housekeep_azspice waiting
housekeep_ex1a waiting

Security Considerations

  • I have reviewed my changes for potential security issues
  • Sensitive data is properly handled (if applicable)
  • Authentication and authorisation are properly implemented (if applicable)

Performance Impact

  • Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted

AI Assistance and Attribution

  • Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

  • Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

PSyclone Approval

  • If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Sci/Tech Review

  • I understand this area of code and the changes being added
  • The proposed changes correspond to the pull request description
  • Documentation is sufficient (do documentation papers need updating)
  • Sufficient testing has been completed

(Please alert the code reviewer via a tag when you have approved the SR)

Code Review

  • All dependencies have been resolved
  • Related Issues have been properly linked and addressed
  • CLA compliance has been confirmed
  • Code quality standards have been met
  • Tests are adequate and have passed
  • Documentation is complete and accurate
  • Security considerations have been addressed
  • Performance impact is acceptable

@github-actions github-actions bot added the cla-required The CLA has not yet been signed by the author of this PR - added by GA label Jan 22, 2026
@github-actions github-actions bot added cla-signed The CLA has been signed as part of this PR - added by GA and removed cla-required The CLA has not yet been signed by the author of this PR - added by GA labels Jan 22, 2026
@thomasmelvin
Copy link
Author

A selection of results comparing to stable

Test Branch Trunk
Baroclinic wave (C48) branch-baroclinic-C48 trunk-baroclinic-C48
3D Warm Bubble branch-warm-bubble trunk-warm-bubble
SBR (C24) branch-sbr-C24 trunk-sbr-C24
NWP-GAL9 (C48) branch-nwp-gal9-u_in_w3 trunk-nwp-gal9-u_in_w3
RAL3-SEUK branch-ral3-seuk-u_in_w3 trunk-ral3-seuk-u_in_w3

@thomasmelvin thomasmelvin added the KGO This PR contains changes to KGO label Jan 23, 2026
@thomasmelvin thomasmelvin added this to the Spring 2026 milestone Jan 23, 2026
Copy link
Contributor

@tommbendall tommbendall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good speed-up. I'm happy that this is scientifically equivalent to main and the KGOs are only changing from bit-wise differences in the kernels.

I am generally happy -- the new mixed solver kernels have been literally taken from the existing kernel and split into two parts, making this easy to review.

My main thoughts that aren't captured through my comments on the code:

  1. Do you prefer to keep the old apply_mixed_operator_kernel_mod.F90 file, which I think is now unused? Or could it be removed?
  2. For the assemble_w2h_from_w2hb_kernel, there is already a very similar kernel in the core repo: https://github.com/MetOffice/lfric_core/blob/main/components/science/source/kernel/inter_function_space/sci_average_w2b_to_w2_kernel_mod.F90 . I think it could be better to modify the existing kernel to work for both W2H and W2, rather than add a new kernel here? (appreciating that you'd rather not make this a linked ticket, so feel free to refuse)
  3. The timers will conflict with the changes from #68 and #176 -- so I'm suggesting reverting these changes

@@ -0,0 +1,86 @@
!-----------------------------------------------------------------------------
! Copyright (c) 2017, Met Office, on behalf of HMSO and Queen's Printer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the old copyright statement

if ( LPROF ) call start_timing( id, 'mixed_operator' )
type(r_solver_field_type) :: y_uv_broken

if ( subroutine_timers ) call timer('mixed_solver.operator')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer change will conflict with #68 and #176 so I suggest reverting it here

@@ -418,7 +414,7 @@ contains
call invoke( inc_X_times_Y(rhs, h_diag) )
end if

if ( LPROF ) call stop_timing( id, 'mixed_schur_rhs' )
if ( subroutine_timers ) call timer('schur_precon.rhs')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer change will conflict with #68 and #176 so I suggest reverting it here


if ( LPROF ) call start_timing( id, 'schur_back_substitute' )
type(r_solver_field_type), target :: uvw_norm
if ( subroutine_timers ) call timer('schur_precon.back_sub')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer change will conflict with #68 and #176 so I suggest reverting it here

@@ -507,7 +504,7 @@ contains
state_p => state%get_field_from_position(isol_p)
call invoke( setval_X(state_p, exner_inc) )

if ( LPROF ) call stop_timing( id, 'schur_back_substitute' )
if ( subroutine_timers ) call timer('schur_precon.back_sub')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer change will conflict with #68 and #176 so I suggest reverting it here


if ( LPROF ) call start_timing( id, 'helmholtz_lhs' )
if ( subroutine_timers ) call timer('pressure_solver.helmholtz_lhs')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer change will conflict with #68 and #176 so I suggest reverting it here

@@ -242,7 +241,7 @@ contains
nullify( w3_mask, w2_mask )
end if
nullify( x_vec, y_vec )
if ( LPROF ) call stop_timing( id, 'helmholtz_lhs' )
if ( subroutine_timers ) call timer('pressure_solver.helmholtz_lhs')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer change will conflict with #68 and #176 so I suggest reverting it here

thomasmelvin and others added 5 commits January 27, 2026 15:52
Co-authored-by: Thomas Bendall <14180399+tommbendall@users.noreply.github.com>
Reverting back to stable version
…_kernel_mod_test.pf

Co-authored-by: Thomas Bendall <14180399+tommbendall@users.noreply.github.com>
…kernel_mod_test.pf

Co-authored-by: Thomas Bendall <14180399+tommbendall@users.noreply.github.com>
@thomasmelvin
Copy link
Author

This is a really good speed-up. I'm happy that this is scientifically equivalent to main and the KGOs are only changing from bit-wise differences in the kernels.

I am generally happy -- the new mixed solver kernels have been literally taken from the existing kernel and split into two parts, making this easy to review.

My main thoughts that aren't captured through my comments on the code:

  1. Do you prefer to keep the old apply_mixed_operator_kernel_mod.F90 file, which I think is now unused? Or could it be removed?
  2. For the assemble_w2h_from_w2hb_kernel, there is already a very similar kernel in the core repo: https://github.com/MetOffice/lfric_core/blob/main/components/science/source/kernel/inter_function_space/sci_average_w2b_to_w2_kernel_mod.F90 . I think it could be better to modify the existing kernel to work for both W2H and W2, rather than add a new kernel here? (appreciating that you'd rather not make this a linked ticket, so feel free to refuse)
  3. The timers will conflict with the changes from Adding Timing Wrapper calls throughout LFRic #68 and Calipers performance25 #176 -- so I'm suggesting reverting these changes
  1. Yes I think it's best to keep the old on (though I have removed the algorithm level code for using it). My reasoning is that it is not clear splitting the kernels is the best solution in the long run (e.g. if architectures/compilers change) and so we may wish to go back to the old version and so it is best to keep it
  2. I did consider that but there are 2 differences that made me decide on a new kernel, firstly the new kernel doesn't require the vertical loop (this could have been fixed by adding an if test), secondly the new kernel doesn't require a multiplicity, I couldn't think of a way of easily generalising this without passing in a dummy field of ones which I didnt want to do for a performance sensitive part of the code
  3. I've reverted to the old timer names

Copy link
Contributor

@tommbendall tommbendall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reply. I'm happy with the changes in that case, so this passes science review.

This is now ready for code review @mo-rickywong

Copy link
Contributor

@mo-rickywong mo-rickywong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large number of KGO changes (to be expected).

  • Minor copyright issues
  • Main issue is the calculated expected answers in the unit tests. Known output for a known input. Generally correct known answers should be hardcoded, not calculated.

General comment on the short variable names which are vague to decipher.

Comment on lines +266 to +267
! halo exchange.
! The second step computes the vertical wind (yvec_w) & pressure (yvec_p) lhs, since these
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
! halo exchange.
! The second step computes the vertical wind (yvec_w) & pressure (yvec_p) lhs, since these
! halo exchange.
!
! The second step computes the vertical wind (yvec_w) & pressure (yvec_p) lhs, since these

Comment on lines +268 to +271
! are horizontally discontinuous fields then there are no further halo exchanges required.
! Alternatively this code could be computed in a single kernel
! (apply_mixed_operator_kernel_type), however this appears to have poorer cache usage,
! resulting in an increased computation time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
! are horizontally discontinuous fields then there are no further halo exchanges required.
! Alternatively this code could be computed in a single kernel
! (apply_mixed_operator_kernel_type), however this appears to have poorer cache usage,
! resulting in an increased computation time
! are horizontally discontinuous fields then there are no further halo exchanges required.

Remove last comment, Its a development comment/suggestion which shouldn't be on main. Comments should be on the state of the code on trunk.

! resulting in an increased computation time

! Create broken y_uv
w2hb_fs => function_space_collection%get_fs(mesh, 0, 0, W2Hbroken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded Magic numbers

@@ -0,0 +1,156 @@
!-----------------------------------------------------------------------------
! (C) Crown copyright 2026 Met Office. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
! (C) Crown copyright 2026 Met Office. All rights reserved.
! (C) Crown copyright Met Office. All rights reserved.

Comment on lines +128 to +145
t(:) = 0.0_r_solver
do k = 1, nlayers
t(k) = t(k) - Pt2(k,1,3)*w(k) - Pt2(k,1,4)*w(k+1)
t(k+1) = t(k+1) - Pt2(k,2,3)*w(k) - Pt2(k,2,4)*w(k+1)
end do
t(:) = invMt(:)*t(:)

answer(:) = 0.0_r_solver
do k = 0, nlayers - 1
do dfv = 1,ndf_w2v
u = (/ uv(1+k), uv(4+k), w(1+k), w(2+k) /)
df = ndf_w2h+dfv
answer(map_w2v(dfv)+k) = answer(map_w2v(dfv)+k) &
+ Nu(map_w2(df)+k)*(sum(Mu(k+1,df,:)*u) &
- sum(P2t(k+1,df,:)*t(k+1:k+2)) &
- Grad(k+1,df,1)*p(1+k))
end do
end do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answers shouldn't be calculated, hardcode expected output.

Comment on lines +150 to +151
u = (/ uv(1), uv(4), w(1), w(2) /)
answer(1) = M3p(1,1,1)*p(1) - sum(P3t(1,1,:)*t(1:2)) + sum(Q32(1,1,:)*u(1:4))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answers shouldn't be calculated, hardcode expected output.

Comment on lines +95 to +98
do df = 1,ndf_w2h
answer = Nu(df)*(Mu(df,df,1)*uv(df) - Grad(df,1,1)*p(1))
@assertEqual(answer, Luv(df), tol)
end do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct answers shouldn't be calculated, should be hardcoded

Comment on lines 95 to 98
do df = 1, ndf_w2h
answer = mask(df)*( 0.5_r_solver*rhs(df) + y(df)*z(df)*grad(1,df,1)*x(1))
answer = mask(df)*( 0.5_r_solver*rhs(df) + y(df)*grad(1,df,1)*x(1))
@assertEqual(answer, Luv(df), tol)
end do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As before known answers shouldn't be calculated, that's what the kernel under test is for.

Co-authored-by: Ricky Wong <141156427+mo-rickywong@users.noreply.github.com>
thomasmelvin and others added 3 commits February 6, 2026 11:54
…ner_alg_mod.x90

Co-authored-by: Ricky Wong <141156427+mo-rickywong@users.noreply.github.com>
…2hb_kernel_mod.F90

Co-authored-by: Ricky Wong <141156427+mo-rickywong@users.noreply.github.com>
…2hb_kernel_mod.F90

Co-authored-by: Ricky Wong <141156427+mo-rickywong@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The CLA has been signed as part of this PR - added by GA KGO This PR contains changes to KGO

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants