DataNodeExtractTrans implementation by LonelyCat124 · Pull Request #3301 · stfc/PSyclone

LonelyCat124 · 2026-01-26T15:19:16Z

Initial implementation - transformation is quite straightforward so should be good for a review assuming CI is ok.

codecov · 2026-01-26T15:37:10Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.95%. Comparing base (59f796e) to head (5c5fea5).
⚠️ Report is 18 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #3301   +/-   ##
=======================================
  Coverage   99.95%   99.95%           
=======================================
  Files         380      381    +1     
  Lines       53949    54020   +71     
=======================================
+ Hits        53927    53998   +71     
  Misses         22       22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

LonelyCat124 · 2026-01-26T15:41:53Z

@sergisiso @arporter This is ready for a first review. The implementation is currently pretty small and neat - if you want more functionality testing let me know and I can add more.

arporter

Thanks very much Aidan - as you say it's pleasingly simple. Also as you say, it could do with some more checks and associated tests :-)

On the naming front, I worry that the use of Extract will be confused with the existing ExtractTrans which is obviously quite different. IAAI (two of them) and it seems that IntroduceLocal/TemporaryTrans is a good alternative that doesn't involve "Extract".

Finally, (once everything else is done) I think it would be exciting (and possibly educational) to try this for real. I'm thinking we'd alter nemo/scripts/utils.py::normalise_loops so that it e.g. looks for any Calls which have array expressions as arguments.

arporter · 2026-01-27T10:00:37Z

src/psyclone/psyir/transformations/__init__.py

    "OMPDeclareTargetTrans",
    "OMPCriticalTrans",
    "OMPParallelTrans",
+    "DataNodeExtractTrans",


I think the doc generation works better if we don't put the entry here but instead have a __all__ at the end of the file containing the class implementation. (We can keep the earlier import though since that makes it available from the transformations module.)

arporter · 2026-01-27T10:12:33Z

src/psyclone/psyir/transformations/datanode_extract_trans.py

+                f"'{node.debug_string().strip()}'."
+            )
+
+        dtype = node.datatype


Please mv up to L93 so that we can re-use the result rather than re-computing.

arporter · 2026-01-27T10:13:57Z

src/psyclone/psyir/transformations/datanode_extract_trans.py

+                    )
+
+    def apply(self, node: DataNode, storage_name: str = "", **kwargs):
+        """Applies the DataNodeExtractTransApplies to the input arguments.


"...Trans to the input ..."

arporter · 2026-01-27T10:15:45Z

src/psyclone/psyir/transformations/datanode_extract_trans.py

+        """Applies the DataNodeExtractTransApplies to the input arguments.
+
+        :param node: The datanode to extract.
+        :param storage_name: The name of the temporary variable to store


Perhaps "The (base) name of the..." to indicate that it won't necessarily be exactly that name that is used. Also, please could you de-dent the following lines to just use four spaces so we don't need as many lines.

I guess we need to decide if this should be a base name of if the storage_name argument is provided the name must be exact? Happy to go either way - if the name must be exact I should probably add it to validate.

arporter · 2026-01-27T10:16:46Z

src/psyclone/psyir/transformations/datanode_extract_trans.py

+                        f"so the DataNodeExtractTrans cannot be applied. "
+                        f"Input node was '{node.debug_string().strip()}'."
+                    )
+


Please add a check that storage_name is a str and raise a TypeError if not.

This is done by the validate_options function so we don't need to.

arporter · 2026-01-27T10:31:44Z

src/psyclone/tests/psyir/transformations/datanode_extract_trans_test.py

+    assign = psyir.walk(Assignment)[0]
+    with pytest.raises(TransformationError) as err:
+        dtrans.validate(assign.rhs)
+    assert ("Input node's datatype is an array of unknown size, so the "


In this case, we know that we can query e.g. a for its L/UBOUND but I guess that this requires an extension to the datatype method which you discussed with @sergisiso elsewhere. Might be worth a comment.

If datatype does not work you can check the expression produced by get_effective_shape.

But this made me thing that expressions using imported symbols (even with explicit bounds) may be a problem. Do the symbol in the explicit bound is available locally? Does it have the global or the local interface?

@sergisiso Is there a good way to test this? I'm a bit unfamiliar with making codes that correctly follow imports for the test suite.

Yes, there are a few tests that use a code and an imported module code. See for example src/psyclone/tests/psyir/symbols/symbol_table_test.py::test_resolve_imports_from_child_symtabs

This works until an array index is already an imported symbol in wherever its imported from. I need to think of a good solution for that

Yes, I agree that it's only the latter check that's necessary. In general, it's best not to have an individual transformation import symbols because that performs very badly. It's much better to get the frontend to import them when it first constructs the PSyIR (by having the module names added to RESOLVE_IMPORTS in the transformation script). Therefore, if you could update the error message to say this that would be great - you'll probably need to finesse it a bit depending on whether it is Unresolved or Unsupported.

Further, it's probably going to really help the user if we could say which Symbol(s) is/are causing the problem. Therefore, once we know we're going to raise an error, it would be worth examining node to get its Symbols and seeing which of them are the problem.

I need to think more about the "symbol merging" as it seems kinda hard to do - the existing symbol can be not equal but the essentially the same symbol (same module, name, type etc.) so I need to work out how to handle this.

Ah by import symbols I was only meaning adding the relevant use x_mod: symbol symbols when we find them when importing known symbols.

I've added some code now to handle imported symbols including those in shape declarations, I'm hoping I've not overengineered it but i'll start working on the tests for it next.

arporter · 2026-01-27T10:32:25Z

src/psyclone/tests/psyir/transformations/datanode_extract_trans_test.py

+    assert ("Input node's datatype cannot be computed, so the "
+            "DataNodeExtractTrans cannot be applied. Input node "
+           "was 'b + a'" in str(err.value))
+


Please add a test for an UnsupportedType declaration.

Please also add tests when the transformation is given the wrong type of node and a non-str name.

arporter · 2026-01-27T10:33:54Z

src/psyclone/tests/psyir/transformations/datanode_extract_trans_test.py

+    assign = psyir.walk(Assignment)[0]
+    dtrans.apply(assign.rhs.operands[1])
+    out = fortran_writer(psyir)
+    assert ("integer, dimension(SIZE(a, dim=1),SIZE(b, dim=2)) :: tmp"


Ah, this was what you discussed. I guess the same comment applies (that we could do better but it's up to datatype to do that).

arporter · 2026-01-27T10:35:28Z

src/psyclone/tests/psyir/transformations/datanode_extract_trans_test.py

+    out = fortran_writer(psyir)
+    assert "integer :: temporary" in out
+    assert "temporary = INT(a)" in out
+    assert "b = temporary" in out


Please could you add at least one test with a more complicated code structure - e.g. an expression inside a loop. This will exercise the hierarchy of symbol tables.

arporter · 2026-01-27T10:42:25Z

src/psyclone/psyir/transformations/datanode_extract_trans.py

+
+        # Create a symbol of the relevant type.
+        if not storage_name:
+            symbol = node.scope.symbol_table.new_symbol(


Talking with @sergisiso about possible use-cases for this transformation, he mentioned things like:

a(2:4) = 3*a(1:3)

where we need to explicitly store the copy of a(1:3) in a temporary in order to parallelise. With this transformation now we'd generate:

tmp = a(1:3) a(2:4) = tmp

That has made me think that we need to check that we create tmp with dimensions starting at 1. (I think we do because that's what datatype returns?)
Are there any other considerations @sergisiso?

That has made me think that we need to check that we create tmp with dimensions starting at 1.

I am not sure about this, if it doesn't start at the same lbound as the lhs it will generate the offset expression.

However, impure calls could be a problem:

a = globalvar + b(fn(4))

If we try to move b, it could internally update 'globalvar' which would change the order of the use-update. I would not allow impure or unknown purity functions.

Added tests for both of these cases with appropriate errors in validate. (I think the function one is currently already caught by the datatype but I was explicit in case we make improvements in the future).

Just to be clear, the problem is there even without moving the the fn()
a = fn(3) + b(globalvar)
moving b still changes the order if fn also uses globalvar, and this one is not caught by datatype.

Also if the result_type of fn is known, we will have a valid datatype but still be impure.

Yeah - probably just my inexperience, I couldn't create a test that could correctly resolve the datatype of a function (but i do fail on any impure function call in the provided node). I guess what I don't currently handle is:

a = fn(somevar) + b(globalvar) - If b was provided as the argument (and PURE) in theory we'd happily move it which is probably incorrect (if fn is impure)? I'm not sure what the best solution to this is - probably check the statement containing the input node for impure calls?

FIXME:
Ah thats what you wrote - I'll make that change when I'm next on the branch

On a second thought, this will be too restrictive for the first use case: taking argument expressions out of io calls. These are always impure.

So now I think we don't need this check. Sorry for the distraction

…e into 1646_datanode_extract_trans

LonelyCat124 · 2026-02-03T13:50:12Z

@arporter I think I fixed all the issues I found from the last review, ready for another look.

LonelyCat124 added 2 commits January 26, 2026 15:07

DataNodeExtractTrans implementation

1adc9ce

Merge branch 'master' into 1646_datanode_extract_trans

d4d7510

LonelyCat124 requested review from arporter and sergisiso January 26, 2026 15:41

LonelyCat124 added the ready for review label Jan 26, 2026

arporter added under review and removed ready for review labels Jan 27, 2026

arporter requested changes Jan 27, 2026

View reviewed changes

arporter added reviewed with actions and removed under review labels Jan 27, 2026

LonelyCat124 added 9 commits January 27, 2026 14:51

Fixes for review

8c99aa3

Added test for import resolution

dd30fe9

Some more changes towards review

f81a2c3

Merge branch 'master' into 1646_datanode_extract_trans

9a28c7b

Merge branch '1646_datanode_extract_trans' of github.com:stfc/PSyclon…

13db8c7

…e into 1646_datanode_extract_trans

Interface handling

731bb59

Fixed incorrect behaviour due to wrong variable usage

ffa39bc

Missing coverage fixes

b3a90fc

fixed error in validate and coverage

5c5fea5

LonelyCat124 requested a review from arporter February 3, 2026 13:41

LonelyCat124 added ready for review and removed reviewed with actions labels Feb 3, 2026

arporter added under review and removed ready for review labels Feb 5, 2026

Conversation

LonelyCat124 commented Jan 26, 2026

Uh oh!

codecov bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

LonelyCat124 commented Jan 26, 2026

Uh oh!

arporter left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LonelyCat124 commented Feb 3, 2026

Uh oh!

codecov bot commented Jan 26, 2026 •

edited

Loading