Skip to content

Conversation

@dafedafe
Copy link
Contributor

@dafedafe dafedafe commented Dec 2, 2025

Issue

A few test failed intermittently with -Xcomp on Mac due to an unschedulable graph in C2:

  • valhalla/valuetypes/WeakReferenceTest.java
  • valhalla/valuetypes/ProxyTest.java
  • valhalla/valuetypes/ObjectNewInstance.java
  • valhalla/valuetypes/ObjectMethodsViaCondy.java

Causes

The origin of the issue seems to be coming from strength reduction (process_late_inline_calls_no_inline) where we replace virtual and MH calls with direct calls.
https://github.com/dafedafe/valhalla/blob/75e2dd95df5d847d7d6e35a23016d22705681cf4/src/hotspot/share/opto/compile.cpp#L3072
If the return type of the methods are not loaded, we add a call to runtime's store_inline_type_fields_to_buf right after the actual method call, to save the scalarized return into a oop. This happens first for the caller at parse time and then for the callee when strength-reducing the virtual call to a direct one. The return projections of the inline fields of the call are added to store_inline_type_fields_to_buf and its oop projection is then added as input to the other store_inline_type_fields_to_buf which fundamentally leaves the graph around it in an awkward state.

If this happens in combination with loop unswitching it can lead to a graph not being schedulable, which is what happens in the failure of this issue, where we have:

  • a virtual call with a following store_inline_type_fields_to_buf (1) in a loop.
  • the loop undergoes unswitching, which creates 2 copies of the body (including a copy of the virtual call). All outputs of the new virtual call are phi-d with the one of the other path as input of the store_inline_type_fields_to_buf.
  • the new copy of the virtual call is later replaced with a direct call: the creation of the new direct call adds a store_inline_type_fields_to_buf (2) right after the direct call. All the inline type return values of the call being inlined are removed, so the phis now only have one input and are removed by a later GVN pass.
Screenshot 2025-11-26 at 10 33 51
  • this creates an issue later during GCM since the store_inline_type_fields_to_buf (1) call is logically not dominated by any of the 2 sides of the unswitched loop and lands in a separate dominance path of the arguments whose phis have been removed (which are still dominated by the original virtual call).

Solution

The issue happens only when strength-reducing to a direct call and the return type is not loaded. So, there seems to be 2 conceivable fixes: either we avoid strength-reduction if the type is not loaded or we avoid creating the second store_inline_type_fields_to_buf and rewire the output projections of the strength-reduced call to the output of the projections of the original virtual call.
By choosing the first solution we potentially skip a useful optimization (in particular if the return value is always null, which won't trigger the deoptimization). So, the rewiring solution seems to be the best option in this case.

Testing

  • repeated testing with failing tests (same conditions)
  • replay of failing compilations
  • Tier 1-3

Unfortunately creating a regression test that would consistently trigger the specific issue with unloaded return type and strength-reduction to direct call has proven to be unviable.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (1 review required, with at least 1 Committer)

Error

 ⚠️ 8369045 is used in problem lists: [test/jdk/ProblemList-enable-preview.txt]

Issue

  • JDK-8369045: [lworld] valhalla/valuetypes/WeakReferenceTest.java has an unschedulable graph (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1768/head:pull/1768
$ git checkout pull/1768

Update a local copy of the PR:
$ git checkout pull/1768
$ git pull https://git.openjdk.org/valhalla.git pull/1768/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1768

View PR using the GUI difftool:
$ git pr show -t 1768

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1768.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 2, 2025

👋 Welcome back dfenacci! A progress list of the required criteria for merging this PR into lworld will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 2, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot changed the title JDK-8369045: [lworld] valhalla/valuetypes/WeakReferenceTest.java has an unschedulable graph 8369045: [lworld] valhalla/valuetypes/WeakReferenceTest.java has an unschedulable graph Dec 2, 2025
@dafedafe dafedafe marked this pull request as ready for review December 10, 2025 08:23
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 10, 2025
@mlbridge
Copy link

mlbridge bot commented Dec 10, 2025

Webrevs

store_to_buf_call->init_req(i, top());
continue;
// Don't add store to buffer call if we are strength reducing
if (!C->strength_reduction()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use _gvn.is_IterGVN() && !C->inlining_incrementally() here instead? Assuming that whenever we call this after parsing and when not incrementally inlining, we are doing post-parse devirtualization of a call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a possibility. I've also considered checking for !C->inlining_incrementally() && C->late_inline_count() > 0 but I'm a bit torn between being 100% sure we are doing strength reduction but using a compile "flags" (a bit 🫤) and checking for other, already available, conditions (like the one you suggest), which make me a bit unsure and could potentially change in the future...
But I guess introducing a new field to Compile just for this should be avoided. So, let's go for _gvn.is_IterGVN() && !C->inlining_incrementally() 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke too soon: _gvn is actually set when the GraphKit is created (when generating the call) and is not using the context of the call being strength-reduced. So, unfortunately it seems that the least-invasive way to figure out if we are strength-reducing is to have a compile flag. I reverted the change.

if (domain->field_at(i) == Type::HALF) {
store_to_buf_call->init_req(i, top());
continue;
// Don't add store to buffer call if we are strength reducing
Copy link
Member

@TobiHartmann TobiHartmann Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to add the above runtime check of the result again when doing strength reduction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually don't 🙂
I moved it inside the not-strength-reducing block.

@TobiHartmann
Copy link
Member

You also need to remove the tests from the problem list again (see #1793).

@openjdk
Copy link

openjdk bot commented Dec 17, 2025

⚠️ @dafedafe This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@openjdk
Copy link

openjdk bot commented Dec 17, 2025

@dafedafe this pull request can not be integrated into lworld due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8369045
git fetch https://git.openjdk.org/valhalla.git lworld
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge lworld"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Dec 17, 2025
@dafedafe
Copy link
Contributor Author

You also need to remove the tests from the problem list again (see #1793).

Removed.

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Dec 17, 2025
Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-conflict Pull request has merge conflict with target branch rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

2 participants