-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
While doing some performance tests for #3213, I noticed some extreme slow performance of the existing dependency analysis - below the runtime in seconds of can_loop_be_parallelised for two ukca files:
File name loop# DepTools parallel?
ukca_um_strat_photol_mod.F90 2 Execution time loop i: 11.899324 True
ukca_um_strat_photol_mod.F90 3 Execution time loop j: 6.690157 True
ukca_aerod.F90 1 Execution time loop j: 38.493939 True
One reason for this are rather expensive comparisons of identical terms (j == j) - the dependency analysis goes all the way to convert this to a sympy expression, solves j+dj == j for dj, then with dj=0 concludes that there is no loop carried dependency :)
Adding a shortcut for this special case improved the timings to:
ukca_um_strat_photol_mod.F90 2 Execution time loop i: 0.221416 True
ukca_um_strat_photol_mod.F90 3 Execution time loop j: 0.188218 True
ukca_aerod.F90 1 Execution time loop j: 0.829583 True
A speedup of between 35 and 45 (potentially more, I see some runtime variation, these are not the fastest measurements)
The patch:
@@ -632,9 +632,21 @@ class DependencyTools():
elif len(set_of_vars) == 1:
# One loop variable used in both accesses.
# E.g. `a(2*i+3) = a(i*i)`
+ if (isinstance(index_write, Reference) and
+ isinstance(index_other, Reference) and
+ index_write == index_other):
+ if index_write.name == loop_var:
+ return True
+ # The expression does not depend on the loop variable
+ # at all (dependency distance would return None),
+ # again no need for an explicit test, we can continue
+ # the outer loop
+ continue
This still passed all tests.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels