Conversation
tommbendall
left a comment
There was a problem hiding this comment.
This is a really good speed-up. I'm happy that this is scientifically equivalent to main and the KGOs are only changing from bit-wise differences in the kernels.
I am generally happy -- the new mixed solver kernels have been literally taken from the existing kernel and split into two parts, making this easy to review.
My main thoughts that aren't captured through my comments on the code:
- Do you prefer to keep the old
apply_mixed_operator_kernel_mod.F90file, which I think is now unused? Or could it be removed? - For the
assemble_w2h_from_w2hb_kernel, there is already a very similar kernel in the core repo: https://github.com/MetOffice/lfric_core/blob/main/components/science/source/kernel/inter_function_space/sci_average_w2b_to_w2_kernel_mod.F90 . I think it could be better to modify the existing kernel to work for both W2H and W2, rather than add a new kernel here? (appreciating that you'd rather not make this a linked ticket, so feel free to refuse) - The timers will conflict with the changes from #68 and #176 -- so I'm suggesting reverting these changes
science/gungho/unit-test/kernel/solver/apply_mixed_wp_operator_kernel_mod_test.pf
Outdated
Show resolved
Hide resolved
science/gungho/unit-test/kernel/solver/apply_mixed_u_operator_kernel_mod_test.pf
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,86 @@ | |||
| !----------------------------------------------------------------------------- | |||
| ! Copyright (c) 2017, Met Office, on behalf of HMSO and Queen's Printer | |||
There was a problem hiding this comment.
This looks like the old copyright statement
| if ( LPROF ) call start_timing( id, 'mixed_operator' ) | ||
| type(r_solver_field_type) :: y_uv_broken | ||
|
|
||
| if ( subroutine_timers ) call timer('mixed_solver.operator') |
| @@ -418,7 +414,7 @@ contains | |||
| call invoke( inc_X_times_Y(rhs, h_diag) ) | |||
| end if | |||
|
|
|||
| if ( LPROF ) call stop_timing( id, 'mixed_schur_rhs' ) | |||
| if ( subroutine_timers ) call timer('schur_precon.rhs') | |||
|
|
||
| if ( LPROF ) call start_timing( id, 'schur_back_substitute' ) | ||
| type(r_solver_field_type), target :: uvw_norm | ||
| if ( subroutine_timers ) call timer('schur_precon.back_sub') |
| @@ -507,7 +504,7 @@ contains | |||
| state_p => state%get_field_from_position(isol_p) | |||
| call invoke( setval_X(state_p, exner_inc) ) | |||
|
|
|||
| if ( LPROF ) call stop_timing( id, 'schur_back_substitute' ) | |||
| if ( subroutine_timers ) call timer('schur_precon.back_sub') | |||
|
|
||
| if ( LPROF ) call start_timing( id, 'helmholtz_lhs' ) | ||
| if ( subroutine_timers ) call timer('pressure_solver.helmholtz_lhs') |
| @@ -242,7 +241,7 @@ contains | |||
| nullify( w3_mask, w2_mask ) | |||
| end if | |||
| nullify( x_vec, y_vec ) | |||
| if ( LPROF ) call stop_timing( id, 'helmholtz_lhs' ) | |||
| if ( subroutine_timers ) call timer('pressure_solver.helmholtz_lhs') | |||
Co-authored-by: Thomas Bendall <14180399+tommbendall@users.noreply.github.com>
Reverting back to stable version
…_kernel_mod_test.pf Co-authored-by: Thomas Bendall <14180399+tommbendall@users.noreply.github.com>
…kernel_mod_test.pf Co-authored-by: Thomas Bendall <14180399+tommbendall@users.noreply.github.com>
|
tommbendall
left a comment
There was a problem hiding this comment.
Thanks for the reply. I'm happy with the changes in that case, so this passes science review.
This is now ready for code review @mo-rickywong
mo-rickywong
left a comment
There was a problem hiding this comment.
Large number of KGO changes (to be expected).
- Minor copyright issues
- Main issue is the calculated expected answers in the unit tests. Known output for a known input. Generally correct known answers should be hardcoded, not calculated.
General comment on the short variable names which are vague to decipher.
| ! halo exchange. | ||
| ! The second step computes the vertical wind (yvec_w) & pressure (yvec_p) lhs, since these |
There was a problem hiding this comment.
| ! halo exchange. | |
| ! The second step computes the vertical wind (yvec_w) & pressure (yvec_p) lhs, since these | |
| ! halo exchange. | |
| ! | |
| ! The second step computes the vertical wind (yvec_w) & pressure (yvec_p) lhs, since these |
| ! are horizontally discontinuous fields then there are no further halo exchanges required. | ||
| ! Alternatively this code could be computed in a single kernel | ||
| ! (apply_mixed_operator_kernel_type), however this appears to have poorer cache usage, | ||
| ! resulting in an increased computation time |
There was a problem hiding this comment.
| ! are horizontally discontinuous fields then there are no further halo exchanges required. | |
| ! Alternatively this code could be computed in a single kernel | |
| ! (apply_mixed_operator_kernel_type), however this appears to have poorer cache usage, | |
| ! resulting in an increased computation time | |
| ! are horizontally discontinuous fields then there are no further halo exchanges required. |
Remove last comment, Its a development comment/suggestion which shouldn't be on main. Comments should be on the state of the code on trunk.
| ! resulting in an increased computation time | ||
|
|
||
| ! Create broken y_uv | ||
| w2hb_fs => function_space_collection%get_fs(mesh, 0, 0, W2Hbroken) |
There was a problem hiding this comment.
Hardcoded Magic numbers
science/gungho/source/algorithm/solver/mixed_schur_preconditioner_alg_mod.x90
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,156 @@ | |||
| !----------------------------------------------------------------------------- | |||
| ! (C) Crown copyright 2026 Met Office. All rights reserved. | |||
There was a problem hiding this comment.
| ! (C) Crown copyright 2026 Met Office. All rights reserved. | |
| ! (C) Crown copyright Met Office. All rights reserved. |
| t(:) = 0.0_r_solver | ||
| do k = 1, nlayers | ||
| t(k) = t(k) - Pt2(k,1,3)*w(k) - Pt2(k,1,4)*w(k+1) | ||
| t(k+1) = t(k+1) - Pt2(k,2,3)*w(k) - Pt2(k,2,4)*w(k+1) | ||
| end do | ||
| t(:) = invMt(:)*t(:) | ||
|
|
||
| answer(:) = 0.0_r_solver | ||
| do k = 0, nlayers - 1 | ||
| do dfv = 1,ndf_w2v | ||
| u = (/ uv(1+k), uv(4+k), w(1+k), w(2+k) /) | ||
| df = ndf_w2h+dfv | ||
| answer(map_w2v(dfv)+k) = answer(map_w2v(dfv)+k) & | ||
| + Nu(map_w2(df)+k)*(sum(Mu(k+1,df,:)*u) & | ||
| - sum(P2t(k+1,df,:)*t(k+1:k+2)) & | ||
| - Grad(k+1,df,1)*p(1+k)) | ||
| end do | ||
| end do |
There was a problem hiding this comment.
Answers shouldn't be calculated, hardcode expected output.
| u = (/ uv(1), uv(4), w(1), w(2) /) | ||
| answer(1) = M3p(1,1,1)*p(1) - sum(P3t(1,1,:)*t(1:2)) + sum(Q32(1,1,:)*u(1:4)) |
There was a problem hiding this comment.
Answers shouldn't be calculated, hardcode expected output.
| do df = 1,ndf_w2h | ||
| answer = Nu(df)*(Mu(df,df,1)*uv(df) - Grad(df,1,1)*p(1)) | ||
| @assertEqual(answer, Luv(df), tol) | ||
| end do |
There was a problem hiding this comment.
Correct answers shouldn't be calculated, should be hardcoded
| do df = 1, ndf_w2h | ||
| answer = mask(df)*( 0.5_r_solver*rhs(df) + y(df)*z(df)*grad(1,df,1)*x(1)) | ||
| answer = mask(df)*( 0.5_r_solver*rhs(df) + y(df)*grad(1,df,1)*x(1)) | ||
| @assertEqual(answer, Luv(df), tol) | ||
| end do |
There was a problem hiding this comment.
As before known answers shouldn't be calculated, that's what the kernel under test is for.
Co-authored-by: Ricky Wong <141156427+mo-rickywong@users.noreply.github.com>
…ner_alg_mod.x90 Co-authored-by: Ricky Wong <141156427+mo-rickywong@users.noreply.github.com>
…2hb_kernel_mod.F90 Co-authored-by: Ricky Wong <141156427+mo-rickywong@users.noreply.github.com>
…2hb_kernel_mod.F90 Co-authored-by: Ricky Wong <141156427+mo-rickywong@users.noreply.github.com>










PR Summary
Sci/Tech Reviewer: @tommbendall
Code Reviewer: @mo-rickywong
Add in a number of solver optimisations.
The principle performance improvement in this pull request is to split the application of the mixed operator into two seperate new kernels.
These changes result in a performance improvement for 2 main reasons
The C224 & C896 lfric atm tests in the test suite were run with these changes giving the following solver times
Code Quality Checklist
Testing
trac.log
These results are from before the KGO update. The failure in the lfric_inputs appears not to be due to this pull request as none of the code changes should be used and is likely one of the occasional lfric_inputs failures that we see
Test Suite Results - lfric_apps - solver_improvements/run6
Suite Information
Task Information
❌ failed tasks - 150
⌛ waiting tasks - 2
Security Considerations
Performance Impact
AI Assistance and Attribution
Documentation
PSyclone Approval
Sci/Tech Review
(Please alert the code reviewer via a tag when you have approved the SR)
Code Review