Solver improvements by thomasmelvin · Pull Request #177 · MetOffice/lfric_apps

thomasmelvin · 2026-01-22T15:46:28Z

PR Summary

Sci/Tech Reviewer: @tommbendall
Code Reviewer: @mo-rickywong

Add in a number of solver optimisations.
The principle performance improvement in this pull request is to split the application of the mixed operator into two seperate new kernels.

science/gungho/source/kernel/solver/apply_mixed_u_operator_kernel_mod.F90 computes the lhs for the horizontal wind components. This is done in the broken W2h space so that a write access can be used, avoiding any colouring or halo swaps. After this call the broken W2h lhs needs to be assembled in the continuous W2h space (reapplying a single halo swap)
science/gungho/source/kernel/solver/apply_mixed_wp_operator_kernel_mod.F90 computes the lhs for the vertical wind and pressure components. As these fields lie on horizontally discontinuous spaces there is no colouring or halo exchanges needed.

These changes result in a performance improvement for 2 main reasons

Better use of memory by splitting the single large kernel into two kernels and (presumably) getting better cache usage
Reduction in the number of halo swaps from 3 (for the horizontal wind, vertical wind & pressure) into 1 (for the broken horizontal wind lhs). It appears one of these saved halo exchanges is then reinstated elsewhere in the code, likely when the pressure lhs or vertical wind lhs is needed in the halo region in a seperate kernel.

The C224 & C896 lfric atm tests in the test suite were run with these changes giving the following solver times

Code	C224	C896
Trunk	50.71	114.57
Branch	41.59	98.69

Code Quality Checklist

I have performed a self-review of my own code
My code follows the project's style guidelines
Comments have been included that aid understanding and enhance the readability of the code
My changes generate no new warnings
All automated checks in the CI pipeline have completed successfully

Testing

I have tested this change locally, using the LFRic Core rose-stem suite
If required (e.g. API changes) I have also run the LFRic Apps test suite using this branch
If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes)
I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.)
Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

trac.log

These results are from before the KGO update. The failure in the lfric_inputs appears not to be due to this pull request as none of the code changes should be used and is likely one of the occasional lfric_inputs failures that we see

Test Suite Results - lfric_apps - solver_improvements/run6

Suite Information

Item	Value
Suite Name	solver_improvements/run6
Suite User	thomas.melvin
Workflow Start	2026-01-22T11:45:40
Groups Run	all

Dependency	Reference	Main Like
casim	MetOffice/casim@2025.12.1	True
jules	MetOffice/jules@2025.12.1	True
lfric_apps	thomasmelvin/lfric_apps@solver_improvements	False
lfric_core	MetOffice/lfric_core@2025.12.1	True
moci	MetOffice/moci@2025.12.1	True
SimSys_Scripts	MetOffice/SimSys_Scripts@2025.12.1	True
socrates	MetOffice/socrates@2025.12.1	True
socrates-spectral	MetOffice/socrates-spectral@2025.12.1	True
ukca	MetOffice/ukca@2025.12.1	True

Task Information

❌ failed tasks - 150

Task	State
check_gungho_model_agnesi_hyd_cart-BiP120x8-2000x2000_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_agnesi_hyd_cart-BiP120x8-2000x2000_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_baroclinic-C24_MG_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_baroclinic-C24_MG_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_baroclinic-alt1-C24s_MG_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_baroclinic-alt1-C24s_MG_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_baroclinic-alt2-C24_MG_op_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_baroclinic-alt2-C24_MG_op_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_baroclinic-alt3-C24_MG_azspice_gnu_fast-debug-64bit-rtran32	failed
check_gungho_model_baroclinic-alt3-C24_MG_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_baroclinic-pert-C24_MG_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_baroclinic-pert-C24_MG_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_bryan_fritsch-dry-BiP200x10-100x100_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_bryan_fritsch-dry-BiP200x10-100x100_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_dcmip200-C24_MG_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_dcmip200-C24_MG_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_dcmip200_realorog-C48_MG_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_dcmip200_realorog-C48_MG_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_dcmip301-C24_MG_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_dcmip301-C24_MG_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_deep-hot-jupiter-C24_MG_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_deep-hot-jupiter-C24_MG_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_earth-like-C24_MG_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_earth-like-C24_MG_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_held-suarez-C24_MG_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_held-suarez-C24_MG_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_lfric-real-domain-C48_MG_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_lfric-real-domain-C48_MG_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_robert-moist-lam-BiP100x8-10x10_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_robert-moist-lam-BiP100x8-10x10_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_robert-moist-smag-BiP100x8-10x10_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_robert-moist-smag-BiP100x8-10x10_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_sbr-C24_MG_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_sbr-C24_MG_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_sbr-alt2-C24_MG_op_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_sbr-alt2-C24_MG_op_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_sbr-alt3-C24_MG_azspice_gnu_fast-debug-64bit-rtran32	failed
check_gungho_model_sbr-alt3-C24_MG_ex1a_gnu_fast-debug-64bit-rtran32	failed
check_gungho_model_sbr_lam-n96_MG_lam_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_sbr_lam-n96_MG_lam_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_sbr_lam-n96_MG_lam_rotate_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_sbr_lam-n96_MG_lam_rotate_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_schar_cart-BiP200x8-500x500_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_schar_cart-BiP200x8-500x500_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_schar_cart-alt2-BiP100x4-1000x1000_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_schar_cart-alt2-BiP100x4-1000x1000_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_semi-implicit-for-linear-C12_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_semi-implicit-for-linear-C12_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_shallow-hot-jupiter-C24_MG_azspice_gnu_fast-debug-64bit-crun1	failed
check_gungho_model_shallow-hot-jupiter-C24_MG_ex1a_gnu_fast-debug-64bit-crun1	failed
check_gungho_model_skamarock_klemp_gw_p0-BiP300x8-1000x2000_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_skamarock_klemp_gw_p0-BiP300x8-1000x2000_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_straka_200m-BiP256x8-200x200_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_straka_200m-BiP256x8-200x200_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_straka_200m-alt1-BiP256x4-200x200_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_straka_200m-alt1-BiP256x4-200x200_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_straka_200m-alt2-BiP256x16-200x50_op_azspice_gnu_fast-debug-64bit	failed
check_gungho_model_straka_200m-alt2-BiP256x16-200x50_op_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_straka_200m-alt3-BiP256x8-200x200_azspice_gnu_fast-debug-64bit-rtran32	failed
check_gungho_model_straka_200m-alt3-BiP256x8-200x200_ex1a_gnu_fast-debug-64bit	failed
check_gungho_model_tidally-locked-earth-C24_MG_azspice_gnu_fast-debug-64bit-crun1	failed
check_gungho_model_tidally-locked-earth-C24_MG_ex1a_gnu_fast-debug-64bit-crun1	failed
check_gungho_model_tidally-locked-earth-C24s_rot_MG_azspice_gnu_fast-debug-64bit-crun1	failed
check_gungho_model_tidally-locked-earth-C24s_rot_MG_ex1a_gnu_fast-debug-64bit-crun1	failed
check_jedi_lfric_tests_forecast_gh-si-for-linear-C12_azspice_gnu_fast-debug-64bit	failed
check_jedi_lfric_tests_forecast_gh-si-for-linear-C12_azspice_gnu_full-debug-64bit	failed
check_jedi_lfric_tests_forecast_gh-si-for-linear-C12_ex1a_cce_fast-debug-64bit	failed
check_jedi_lfric_tests_nwp_gal9-C12_azspice_gnu_fast-debug-64bit	failed
check_jedi_lfric_tests_nwp_gal9-C12_ex1a_cce_fast-debug-64bit	failed
check_jedi_lfric_tests_tlm_forecast_tl_default-C12_azspice_gnu_fast-debug-64bit	failed
check_jedi_lfric_tests_tlm_forecast_tl_default-C12_ex1a_cce_fast-debug-64bit	failed
check_jedi_lfric_tests_tlm_forecast_tl_default-C12_op_azspice_gnu_fast-debug-64bit	failed
check_jedi_lfric_tests_tlm_forecast_tl_default-C12_op_ex1a_cce_fast-debug-64bit	failed
check_lfric_atm_aquaplanet-C12_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_aquaplanet-C12_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_camembert_case3_gj1214b-C12_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_camembert_case3_gj1214b-C12_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_clim_gal9-C12_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_clim_gal9-C12_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_clim_gal9_1T-C12_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_clim_gal9_1T-C48_MG_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_clim_gal9_2T-C12_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_clim_gal9_2T-C48_MG_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_clim_gal9_4T-C48_MG_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_clim_gal9_chem-C12_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_clim_gal9_chem-C12_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_clim_gal9_chem_1T-C12_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_clim_gal9_chem_2T-C12_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_comp_tran_ref_3d_l120-BiP64x64-1500x1500_MG_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_hd209458b-C24_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_hd209458b-C24_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_casim-C12_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_casim-C12_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_coma9-C12_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_coma9-C12_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_comorph_dev-C12_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_comorph_dev-C12_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_comorph_tb-C12_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_gal9-C12_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_gal9-C12_azspice_gnu_fast-debug-64bit-crun1	failed
check_lfric_atm_nwp_gal9-C12_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_gal9-C12_ex1a_cce_fast-debug-64bit-crun1	failed
check_lfric_atm_nwp_gal9-C48_MG_azspice_gnu_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9-C48_MG_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9-pert-C12_azspice_gnu_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9-pert-C12_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9_coarse_aero-C48_MG_azspice_gnu_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9_coarse_aero-C48_MG_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9_coarse_aero_threaded-C48_MG_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9_coarse_aero_threaded-C48_MG_ex1a_gnu_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9_da-C12_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_gal9_da-C12_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_gal9_debug-C12_azspice_gnu_full-debug-32bit	failed
check_lfric_atm_nwp_gal9_debug-C12_ex1a_cce_full-debug-32bit	failed
check_lfric_atm_nwp_gal9_debug-C48_MG_azspice_gnu_full-debug-32bit	failed
check_lfric_atm_nwp_gal9_debug-C48_MG_ex1a_cce_full-debug-32bit	failed
check_lfric_atm_nwp_gal9_eda-C12_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_gal9_eda-C12_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_gal9_eda_jada-C12_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_gal9_eda_jada-C12_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_gal9_mol-C12_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_gal9_mol-C12_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_nwp_gal9_noukca_1T-C12_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9_noukca_1T-C48_MG_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9_noukca_2T-C12_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9_noukca_2T-C48_MG_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9_noukca_2T-C48_MG_ex1a_cce_full-debug-32bit	failed
check_lfric_atm_nwp_gal9_noukca_4T-C48_MG_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9_short-C12_azspice_gnu_fast-debug-32bit	failed
check_lfric_atm_nwp_gal9_short-C12_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_ral3-seuk_MG_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_ral3-seuk_MG_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_ral3_ens-seuk_MG_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_ral3_ens-seuk_MG_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_ral3_mixmol-seuk_MG_azspice_gnu_fast-debug-32bit-crun1	failed
check_lfric_atm_ral3_mixmol-seuk_MG_ex1a_cce_fast-debug-32bit-crun1	failed
check_lfric_atm_rce-BiP64x64-1500x1500_MG_azspice_gnu_fast-debug-32bit	failed
check_lfric_atm_rce-BiP64x64-1500x1500_MG_ex1a_cce_fast-debug-32bit	failed
check_lfric_atm_thai_ben1-C48_MG_azspice_gnu_fast-debug-32bit	failed
check_lfric_atm_thai_ben1-C48_MG_ex1a_cce_fast-debug-32bit	failed
check_lfric_coupled_nwp_gal9-C48_ex1a_cce_fast-debug-64bit	failed
check_linear_model_dcmip301-C24_azspice_gnu_fast-debug-64bit	failed
check_linear_model_dcmip301-C24_ex1a_gnu_fast-debug-64bit	failed
check_linear_model_nwp_gal9-C12_MG_azspice_gnu_fast-debug-64bit	failed
check_linear_model_nwp_gal9-C12_MG_ex1a_gnu_fast-debug-64bit	failed
check_linear_model_nwp_gal9_random-C12_MG_azspice_gnu_fast-debug-64bit	failed
check_linear_model_nwp_gal9_random-C12_MG_ex1a_gnu_fast-debug-64bit	failed
check_linear_model_semi-implicit-C12_azspice_gnu_fast-debug-64bit	failed
check_linear_model_semi-implicit-C12_ex1a_gnu_fast-debug-64bit	failed
rose_ana_lfricinputs_um2lfric-protogal_chem-N48L70_C12L70_azspice_gnu_full-debug-64bit	failed

✅ succeeded tasks - 1305

⌛ waiting tasks - 2

Task	State
housekeep_azspice	waiting
housekeep_ex1a	waiting

Security Considerations

I have reviewed my changes for potential security issues
Sensitive data is properly handled (if applicable)
Authentication and authorisation are properly implemented (if applicable)

Performance Impact

Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted

AI Assistance and Attribution

Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

PSyclone Approval

If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Sci/Tech Review

I understand this area of code and the changes being added
The proposed changes correspond to the pull request description
Documentation is sufficient (do documentation papers need updating)
Sufficient testing has been completed

(Please alert the code reviewer via a tag when you have approved the SR)

Code Review

All dependencies have been resolved
Related Issues have been properly linked and addressed
CLA compliance has been confirmed
Code quality standards have been met
Tests are adequate and have passed
Documentation is complete and accurate
Security considerations have been addressed
Performance impact is acceptable

thomasmelvin · 2026-01-23T14:16:22Z

A selection of results comparing to stable

Test	Branch	Trunk
Baroclinic wave (C48)
3D Warm Bubble
SBR (C24)
NWP-GAL9 (C48)
RAL3-SEUK

tommbendall

This is a really good speed-up. I'm happy that this is scientifically equivalent to main and the KGOs are only changing from bit-wise differences in the kernels.

I am generally happy -- the new mixed solver kernels have been literally taken from the existing kernel and split into two parts, making this easy to review.

My main thoughts that aren't captured through my comments on the code:

Do you prefer to keep the old apply_mixed_operator_kernel_mod.F90 file, which I think is now unused? Or could it be removed?
For the assemble_w2h_from_w2hb_kernel, there is already a very similar kernel in the core repo: https://github.com/MetOffice/lfric_core/blob/main/components/science/source/kernel/inter_function_space/sci_average_w2b_to_w2_kernel_mod.F90 . I think it could be better to modify the existing kernel to work for both W2H and W2, rather than add a new kernel here? (appreciating that you'd rather not make this a linked ticket, so feel free to refuse)
The timers will conflict with the changes from #68 and #176 -- so I'm suggesting reverting these changes

build/local_build.py

science/gungho/unit-test/kernel/solver/apply_mixed_wp_operator_kernel_mod_test.pf

science/gungho/unit-test/kernel/solver/apply_mixed_u_operator_kernel_mod_test.pf

tommbendall · 2026-01-27T14:47:06Z

science/gungho/unit-test/kernel/core_dynamics/assemble_w2h_from_w2hb_kernel_mod_test.pf

@@ -0,0 +1,86 @@
+!-----------------------------------------------------------------------------
+! Copyright (c) 2017,  Met Office, on behalf of HMSO and Queen's Printer


This looks like the old copyright statement

tommbendall · 2026-01-27T14:54:21Z

science/gungho/source/algorithm/solver/mixed_operator_alg_mod.x90

-    if ( LPROF ) call start_timing( id, 'mixed_operator' )
+    type(r_solver_field_type) :: y_uv_broken
+
+    if ( subroutine_timers ) call timer('mixed_solver.operator')