Skip to content

DataNodeExtractTrans implementation#3301

Open
LonelyCat124 wants to merge 11 commits intomasterfrom
1646_datanode_extract_trans
Open

DataNodeExtractTrans implementation#3301
LonelyCat124 wants to merge 11 commits intomasterfrom
1646_datanode_extract_trans

Conversation

@LonelyCat124
Copy link
Collaborator

Initial implementation - transformation is quite straightforward so should be good for a review assuming CI is ok.

@codecov
Copy link

codecov bot commented Jan 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.95%. Comparing base (59f796e) to head (5c5fea5).
⚠️ Report is 18 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3301   +/-   ##
=======================================
  Coverage   99.95%   99.95%           
=======================================
  Files         380      381    +1     
  Lines       53949    54020   +71     
=======================================
+ Hits        53927    53998   +71     
  Misses         22       22           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LonelyCat124
Copy link
Collaborator Author

@sergisiso @arporter This is ready for a first review. The implementation is currently pretty small and neat - if you want more functionality testing let me know and I can add more.

Copy link
Member

@arporter arporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much Aidan - as you say it's pleasingly simple. Also as you say, it could do with some more checks and associated tests :-)

On the naming front, I worry that the use of Extract will be confused with the existing ExtractTrans which is obviously quite different. IAAI (two of them) and it seems that IntroduceLocal/TemporaryTrans is a good alternative that doesn't involve "Extract".

Finally, (once everything else is done) I think it would be exciting (and possibly educational) to try this for real. I'm thinking we'd alter nemo/scripts/utils.py::normalise_loops so that it e.g. looks for any Calls which have array expressions as arguments.

"OMPDeclareTargetTrans",
"OMPCriticalTrans",
"OMPParallelTrans",
"DataNodeExtractTrans",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the doc generation works better if we don't put the entry here but instead have a __all__ at the end of the file containing the class implementation. (We can keep the earlier import though since that makes it available from the transformations module.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

f"'{node.debug_string().strip()}'."
)

dtype = node.datatype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mv up to L93 so that we can re-use the result rather than re-computing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

)

def apply(self, node: DataNode, storage_name: str = "", **kwargs):
"""Applies the DataNodeExtractTransApplies to the input arguments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...Trans to the input ..."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

"""Applies the DataNodeExtractTransApplies to the input arguments.

:param node: The datanode to extract.
:param storage_name: The name of the temporary variable to store
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "The (base) name of the..." to indicate that it won't necessarily be exactly that name that is used. Also, please could you de-dent the following lines to just use four spaces so we don't need as many lines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to decide if this should be a base name of if the storage_name argument is provided the name must be exact? Happy to go either way - if the name must be exact I should probably add it to validate.

f"so the DataNodeExtractTrans cannot be applied. "
f"Input node was '{node.debug_string().strip()}'."
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a check that storage_name is a str and raise a TypeError if not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done by the validate_options function so we don't need to.

assign = psyir.walk(Assignment)[0]
with pytest.raises(TransformationError) as err:
dtrans.validate(assign.rhs)
assert ("Input node's datatype is an array of unknown size, so the "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we know that we can query e.g. a for its L/UBOUND but I guess that this requires an extension to the datatype method which you discussed with @sergisiso elsewhere. Might be worth a comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If datatype does not work you can check the expression produced by get_effective_shape.

But this made me thing that expressions using imported symbols (even with explicit bounds) may be a problem. Do the symbol in the explicit bound is available locally? Does it have the global or the local interface?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergisiso Is there a good way to test this? I'm a bit unfamiliar with making codes that correctly follow imports for the test suite.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are a few tests that use a code and an imported module code. See for example src/psyclone/tests/psyir/symbols/symbol_table_test.py::test_resolve_imports_from_child_symtabs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works until an array index is already an imported symbol in wherever its imported from. I need to think of a good solution for that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that it's only the latter check that's necessary. In general, it's best not to have an individual transformation import symbols because that performs very badly. It's much better to get the frontend to import them when it first constructs the PSyIR (by having the module names added to RESOLVE_IMPORTS in the transformation script). Therefore, if you could update the error message to say this that would be great - you'll probably need to finesse it a bit depending on whether it is Unresolved or Unsupported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further, it's probably going to really help the user if we could say which Symbol(s) is/are causing the problem. Therefore, once we know we're going to raise an error, it would be worth examining node to get its Symbols and seeing which of them are the problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think more about the "symbol merging" as it seems kinda hard to do - the existing symbol can be not equal but the essentially the same symbol (same module, name, type etc.) so I need to work out how to handle this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah by import symbols I was only meaning adding the relevant use x_mod: symbol symbols when we find them when importing known symbols.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some code now to handle imported symbols including those in shape declarations, I'm hoping I've not overengineered it but i'll start working on the tests for it next.

assert ("Input node's datatype cannot be computed, so the "
"DataNodeExtractTrans cannot be applied. Input node "
"was 'b + a'" in str(err.value))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for an UnsupportedType declaration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add tests when the transformation is given the wrong type of node and a non-str name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

assign = psyir.walk(Assignment)[0]
dtrans.apply(assign.rhs.operands[1])
out = fortran_writer(psyir)
assert ("integer, dimension(SIZE(a, dim=1),SIZE(b, dim=2)) :: tmp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was what you discussed. I guess the same comment applies (that we could do better but it's up to datatype to do that).

out = fortran_writer(psyir)
assert "integer :: temporary" in out
assert "temporary = INT(a)" in out
assert "b = temporary" in out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you add at least one test with a more complicated code structure - e.g. an expression inside a loop. This will exercise the hierarchy of symbol tables.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


# Create a symbol of the relevant type.
if not storage_name:
symbol = node.scope.symbol_table.new_symbol(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking with @sergisiso about possible use-cases for this transformation, he mentioned things like:

   a(2:4) = 3*a(1:3)

where we need to explicitly store the copy of a(1:3) in a temporary in order to parallelise. With this transformation now we'd generate:

  tmp = a(1:3)
  a(2:4) = tmp

That has made me think that we need to check that we create tmp with dimensions starting at 1. (I think we do because that's what datatype returns?)
Are there any other considerations @sergisiso?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That has made me think that we need to check that we create tmp with dimensions starting at 1.

I am not sure about this, if it doesn't start at the same lbound as the lhs it will generate the offset expression.

However, impure calls could be a problem:

a = globalvar + b(fn(4))

If we try to move b, it could internally update 'globalvar' which would change the order of the use-update. I would not allow impure or unknown purity functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests for both of these cases with appropriate errors in validate. (I think the function one is currently already caught by the datatype but I was explicit in case we make improvements in the future).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, the problem is there even without moving the the fn()
a = fn(3) + b(globalvar)
moving b still changes the order if fn also uses globalvar, and this one is not caught by datatype.

Also if the result_type of fn is known, we will have a valid datatype but still be impure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - probably just my inexperience, I couldn't create a test that could correctly resolve the datatype of a function (but i do fail on any impure function call in the provided node). I guess what I don't currently handle is:

a = fn(somevar) + b(globalvar) - If b was provided as the argument (and PURE) in theory we'd happily move it which is probably incorrect (if fn is impure)? I'm not sure what the best solution to this is - probably check the statement containing the input node for impure calls?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME:
Ah thats what you wrote - I'll make that change when I'm next on the branch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, this will be too restrictive for the first use case: taking argument expressions out of io calls. These are always impure.

So now I think we don't need this check. Sorry for the distraction

@LonelyCat124
Copy link
Collaborator Author

@arporter I think I fixed all the issues I found from the last review, ready for another look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants