Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Jan 8, 2026

No description provided.

fpetrogalli and others added 30 commits January 8, 2026 00:35
…er. [NFCI] (llvm#174890)

This function is invoked only for ELF targets, therefore it has been
moved to the ELF-specific streamer.

An assertion has been added to catch its invocations outside of an
invocation that targets ELF.
…vm#174695)

Remove unneeded load instructions and only remain one comparison
instruction.
…antCopyElimination (llvm#174706)

This patch is like what Xqcibi did in
llvm#174358.
…t() (llvm#174438)

Some machines are able to make use of AUIPC + ADDI or LUI + ADDI fusion, make sure to consider that in the cost model for `RISCVTTIImpl::getConstantPoolLoadCost()`.
…lvm#174811)

The keep-registers mode isn't super useful without disabling
explicit-locals,
as the local gets/sets are irrelevant noise in most cases.
Switching this test makes the output much more concise and will make
upcoming
changes easier to review.
Not sure if this will fix the problem because I don't have a 32-bit arm
machine to test with.
…vm#173268)

This commit adds support for family specific support for the following
intrinsics:
- ldmatrix
- stmatrix
- mma.block_scale, mma.sp.block_scale
- redux.sync
- cvt.rs
- clusterlaunchcontrol
- setmaxnreg
- tcgen05.mma

Removed `hasTcgen05Instructions` function in the favour of `hasTcgen05InstSupport` Updated wmma.py script with family specific support and added new tests
This allows speculating recursively speculatable operations
containing `fir.result`. Note that making it Pure does not allow
speculating `fir.result` itself from its containing operation,
since it is a terminator.
Fixes a failure on llc for ubsan builds:

../lib/Target/RISCV/RISCVAsmPrinter.cpp:552:7: runtime error: downcast
of null pointer of type 'RISCVTargetStreamer'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
../lib/Target/RISCV/RISCVAsmPrinter.cpp:552:7
llvm#173659)

This PR fixes a crash by validating the type of the `kind` attribute.
For `vector.contract` and `vector.outerproduct`, the verifier now emits
an error when `kind` is not a CombiningKindAttr. Fixes llvm#173555.
…e` to `shufflevector` (llvm#169110)

Resolves llvm#169058.

This adds ~~an InstCombine pass~~ a TTI hook to the WebAssembly backend
that folds `i8x16.swizzle` and `i8x16.relaxed.swizzle` operations to
`shufflevector` operations if their mask operands are constant.

This is mainly useful for abstractions over the raw intrinsics--for
instance, in architecture-generic SIMD code that may not be able to
expose the constant shuffles due to type system limitations.

I took most of this from the x86 backend (in particular,
`simplifyX86vpermilvar` in `X86InstCombineIntrinsic`), and adapted it
for the WebAssembly backend. There wasn't any previous
`instCombineIntrinsic` method on the WebAssembly `TargetTransformInfo`,
so I added it. Right now, this swizzle optimization is the only one it
performs.

As I noted in the transform itself, the "relaxed" swizzle actually has
stricter preconditions than the non-relaxed one. If a non-negative but
still out-of-bounds index is provided, the "relaxed" swizzle can choose
between returning 0 and the lane at the index modulo 16. However, it
must make the same choice every time, and we don't know which choice the
runtime will make, so we can't constant-fold it.

The regression tests were mostly generated by Claude and adapted a bit
by me (I tried to follow the [InstCombine contributor
guide](https://llvm.org/docs/InstCombineContributorGuide.html#tests)).
There was previously no WebAssembly subdirectory within the InstCombine
tests, so I created that too; as of now, the swizzle fold test is the
only file in it. Everything else was written by myself (well, partly
copy-pasted from the x86 backend).

I'm not sure how to write an Alive2 test for this; I can't find any
examples where the input is an arbitrary constant.
This patch handles the special case where an extract value yields an
aggregate result, which then is used as an argument to a store. The
SPIRV BE uses special intrinsics (`spv_extractv` and `spv_store`) to
represent these through IRTranslator, however this creates a problem:
`spv_store` is called as a function, and IRTranslator cannot handle
arguments that take more than a vreg. For other functions, the aggregate
argument replacement pass would have solved things, but it does not
apply here. Hence, we apply the same mutate-into-Int32 solution here
when dealing with stores, and restore the extract value's type (which we
have available as a ValueAttr) during instruction selection.
…, Z) (llvm#173808)

We use fli+fneg to generate negative float, eliminate the fneg for fma.
Fold fma to vfnmsac.vf,vfnmsub.vf, vfnmacc.vf, vfnmadd.vf

---------

Co-authored-by: Craig Topper <craig.topper@sifive.com>
CAPIIR includes this in some of its source files, so we need to ensure
the header is around.
IR should be a splat of 7 as this compares vector of elements with 7
(`vec[i]!=7`). Having `zeroinitializer` goes against this comparison.

Co-authored-by: himadhith <himadhith.v@ibm.com>
This PR replace `enum`s with `enum class`es in Python bindings. No
functional change.
This avoids repeatedly reparsing the URL to extract the number.
…0794)

We match GCC's coverage of the extension, that is, everything except
`setn` and `setnhi`.
See also: https://docs.oracle.com/cd/E53394_01/html/E54833/gmael.html
…#174745)

When activating a union member, none of the unions in that path can have
a non-trivial constructor. Unfortunately, this is something we have to
do when evaluating the bytecode, not while compiling it.
Remove unneeded load instructions.
…ns (llvm#173658)

Because
- no one should use them in explicit function call forms, and
- value-discarding casts to non-`void` are already diagnosed by Clang.

Also add explanation for this to `CodingGuidelines.rst`.
There were already tags for protected members in the Mustache template,
but didn't use the proper tags for the newer JSON scheme.
…on RV32. (llvm#174703)

On RV32, i64 elements have type 'long long'.

Fixes llvm#174613.
…173873)

The current implementation does not drop unit extent dimension if that
dimension is indexed by a non-trivial affine expression (i.e., not a
single dimension or constant 0) on the first application of the
transformation. However, it is possible to drop such dimensions if all
dimensions involved in the affine expression are going to be dropped. So
far, this required repeated application of the transformation, with the
changes in this PR, the dimensions are dropped with a single application
of the transformation.

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>
ZVFH and ZVFHMIN have the same code generation here.
… or more words (llvm#174789)

This patch adds support for generating the `Xqcilsm` multi-word
load/store instructions for three or more words. We add a new function
in the `RISCVLoadStoreOptimizer` pass for doing this separate from the
one that does load store pairing. The reason for this is that the
implementation currently only looks for consecutive loads and stores to
merge where as the pairing logic has no such restriction. We also only
traverse the basic block top down for now while looking for instructions
to merge.
…174598)

This doesn't require p extension since it's just normal scalar
instructions, but they're normally used with other p extension
instructions so I just put them together.
mgcarrasco and others added 17 commits January 8, 2026 11:53
…74585)

Previously, the SPIRV BE only handled equality and inequality
comparisons (ICMP_EQ, ICMP_NE) for i1 types using logical operations
(OpLogicalEqual, OpLogicalNotEqual). Other comparison predicates
(signed/unsigned less than, greater than, etc.) triggered an unreachable
assertion.

This patch extends the support for the missing predicates. The BE
considers i1 values as booleans and in SPIR-V only logical operations
can work on them. This patch lowers the missing predicates into
supported logical operations.

The lowering has been validated with instcombine to avoid introducing an
unsound transformation (using the new test case).
…ility check

The LLVM release version and Apple LLDB version follow slightly different numbering scheme. Make sure we set the minimum required LLDB version appropriately.

Also refactors the `apple-lldb-pre-1000` feature check to use the same `get_lldb_version_string` method.

Currently this was causing the LLDB LLVM formatters to be skipped on our public macOS CI.
The build mode has been deprecated in llvm#136314. According to the
deprecation message, it was supposed to be removed in the LLVM 21
release. Each build mode increased the maintanance overhead when
failing, such as in llvm#151117.

Let's remove it in LLVM 22.
…en check for repeated load. NFC. (llvm#174950)

This will make it easier to handle other splat values that are free to
concat.

There should be no need to do repeated peekThroughBitcasts for every
(canonicalised) bitcasted operand.
…lvm#172652)

This patch adds static instruction tables tests for the old Cyclone
scheduling model bounded to `-mcpu=apple-m1`, for the sake of a
reference. It creates a new `llvm/test/tools/llvm-mca/AArch64/Apple`
directory, moves a Cyclone test there, and adds 2 tests
`basic-instructions` and `neon-instructions` from Neoverse with reusable
inputs (in addition to Neoverse, we also match stderr output of llvm-mca
for instruciton warnings).
…152189)"

This reverts commit 20d0ec8.

The publish-sphinx-docs buildbot still uses LLVM_ENABLE_PROJECTS=openmp.
According to this RFC
(https://discourse.llvm.org/t/rfc-adopt-regularly-scheduled-python-minimum-version-bumps/88841),
we no longer support Python versions that have reached EOL. This PR
mainly clarifies the somewhat vague wording of “a relatively recent
Python 3 installation.”
…t being hidden (llvm#174969)

By the time the maxps nodes have been lowered, the v4f32 constant splats
have been hidden behind a extract_subvector from the v8f32 constant
splat
This patch adds the /linkreprofullpathrsp flag with the same behaviour
as link.exe. This flag emits a file containing the full paths to each
object passed to the link line.

This is used in particular when linking Arm64X binaries, as you need the
full path to all the Arm64 objects that were used in a standard Arm64
build.

See:

https://learn.microsoft.com/en-us/cpp/build/reference/link-repro-full-path-rsp
for the Microsoft documentation of the flag.

Relands llvm#165449
@z1-cciauto
Copy link
Collaborator

@ronlieb ronlieb requested a review from ergawy January 8, 2026 14:21
@z1-cciauto z1-cciauto merged commit acbadea into amd-staging Jan 8, 2026
28 checks passed
@z1-cciauto z1-cciauto deleted the amd/merge/upstream_merge_20260108070058 branch January 8, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.