merge main into amd-staging #1011

z1-cciauto · 2026-01-06T12:06:23Z

No description provided.

The offset hint can be negative, so we should use getSigned() here. This avoids an assertion failure with llvm#171456. Extend the dynamic_cast tests to include a 32-bit target to cover this case.

…ling (llvm#173871)

The register containing the values stored by `QC.SWMI` need to be `GPRNoX0`.

…summaries (llvm#174398) Depends on: * llvm#174385 (only last commit is relevant for this review) The `${var%s}` format isn't capable of formatting references to C-strings. So the summary for those becomes `<no value available>`. This patch prevents the system C-string formatter from applying to references, which means the summary for such types will be empty. This prompts LLDB to instead print the child, which is the referenced C-string. Before: ``` (lldb) v ref (const char *&) ref = 0x000000016fdfe960 <no value available> ``` After: ``` (lldb) v ref (const char *&) ref = 0x000000016fdfec40 (&ref = "hi") ``` An alternative would be to support references in the `ValueObject` dump methods. We assume C-string are pointers/arrays in a lot of places, so such a fix would be a more intrusive undertaking, and I'm not sure we would want to support references there in the first place. So for now I went with the fallback logic in this PR.

…lvm#173449) ## Summary This PR fixes llvm#173370 in the `tosa-validate` pass that occurs when the input IR does not contain any TOSA operations. **Crash Message:** ```text LLVM ERROR: can't create Attribute 'mlir::tosa::TargetEnvAttr' because storage uniquer isn't initialized: the dialect was likely not loaded, or the attribute wasn't added with addAttributes<...>() in the Dialect::initialize() method. ``` ## Problem When `mlir-opt` parses an input file without TOSA operations, the `TosaDialect` is not lazily loaded. However, the `TosaValidation` pass previously called `lookupTargetEnvOrDefault` (which attempts to create a `TargetEnvAttr`) before checking if the dialect was loaded. This resulted in an assertion failure because the attribute storage uniquer was not initialized. ## Solution I resolved the issue by placing the `TosaDialect` declaration at the very top of `runOnOperation`. This ensures that `lookupTargetEnvOrDefault` is not accessed when the dialect is uninitialized, preventing the crash. ## Test Added two test in `mlir/test/Dialect/Tosa/tosa_validation_init.mlir`. First case is without TOSA operation. ``` // CHECK-LABEL: func.func @test_validation_pass_init func.func @test_validation_pass_init(%arg0: tensor<1xf32>) -> tensor<1xf32> { // CHECK: math.asin %0 = math.asin %arg0 : tensor<1xf32> return %0 : tensor<1xf32> } ``` Second case is with TOSA Operation. ``` // CHECK-LABEL: func.func @test_tosa_ops func.func @test_tosa_ops(%arg0: tensor<1x2x3x4xf32>, %arg1: tensor<1x2x3x4xf32>) -> tensor<1x2x3x4xf32> { // CHECK: tosa.add %0 = tosa.add %arg0, %arg1 : (tensor<1x2x3x4xf32>, tensor<1x2x3x4xf32>) -> tensor<1x2x3x4xf32> return %0 : tensor<1x2x3x4xf32> } ```

…requirements (llvm#174509) Replace version matching with the new decorator to prevent typos, and make to it clear why we skipped the test.

…() (llvm#171456) Reapply after additional fixes in llvm#174426 and llvm#174431. ----- Disable implicit truncation in the ConstantInt constructor by default. This means that it needs to be passed a signed/unsigned (depending on the IsSigned flag) value matching the bit width. The intention is to prevent the recurring bug where people write something like `ConstantInt::get(Ty, -1)`, and this "works" until `Ty` is larger than 64-bit and then the value is incorrect due to missing type extension. This is the continuation of llvm#112670, which originally allowed implicit truncation in this constructor to reduce initial scope of the change.

This patch follows the PR#421[1] from the ACLE These 2 FP8 intrinsics had single removed from them: from ``svmla[_single]_za16[_mf8]_vg2x1_fpm`` to ``svmla_za16[_mf8]_vg2x1_fpm`` and from ``svmla[_single]_za32[_mf8]_vg4x1_fpm`` to ``svmla_za32[_mf8]_vg4x1_fpm`` [1]ARM-software/acle#421

…imes layout (llvm#172316) Fixes llvm#172024 This is something a lot of people can probably figure out themselves but having this obvious wrong turn in the getting started document isn't a good first impression. So I've added a note to highlight how to deal with it. I don't want to go into detail there about the layout itself, but it should be enough that people know to check by listing the contents of the lib/ folder.

…X,Y)` fold to SDPatternMatch. NFC. (llvm#174554) Merge the pair of commuted patterns.

…d code (llvm#174105) We add barriers to the firstprivate copy region when they are required to avoid a race condition with the lastprivate clause. The problem is that these barriers are added by the compiler not implied by user code so it is the compiler's problem to avoid deadlock. I came across a testcase whilst working on taskloop support that looks a bit like this ``` !$omp parallel !$omp single !$omp taskloop firstprivate(a) lastprivate(a) ... !$omp end single !$omp end parallel ``` This is so that there are multiple threads for the generated tasks to be distributed over, but we don't generate the tasks afresh in every thread. The problem comes when the taskloop requires a barrier to prevent the datarace between firstprivate and lastprivate. This barrier will then be generated inside of SINGLE and so only one thread will encounter the barrier: leading to a deadlock. This patch works around the problem by detecting this situation statically and then not generating the barrier. There are cases where we cannot detect this statically (e.g. if the TASKLOOP is inside a function call inside of SINGLE). The program will still deadlock in this case after my patch. I'm unsure what the solution would be for that case. I want to fix this simple case in LLVM 22 before engaging in a longer discussion as to whether there is a better way to handle the more general case. Testing using wsloop because I want to land this (or not) independently of taskloop. Note that for wsloop it would be up to the programmer to remember to use the nowait clause, but nowait cannot be used to control generation of this barrier because it refers to the barrier after the construct not after firstprivate copyin (before the construct execution).

) Reverts llvm#172477 This is causing failures for RVA23 (including some tests running away in their execution causing OOM, hence the builder dying). I will attempt to follow up on the PR with a reproducer of some kind. https://lab.llvm.org/buildbot/#/builders/210/builds/7243

This patch add intrinsics for crpyto instructions defined in ARM-software/acle#411 ACLE proposal

@mshockwave

…llvm#174421) llvm-mca currently attempts to read the input file (or stdin) even when invoked with -mcpu=help. When the input is stdin, this causes the tool to block unless an empty stdin is provided. This patch now allows the available CPUs/features to be printed without requiring stdin, while existing behaviour for all other invocations still requires stdin. - mcpu-help.test has been added Follow-on from reverted llvm#173399. Implements @mshockwave's suggestion.

Follow the LLVM coding standard

This patch was generated by following commands: 1. `npm install --save-dev prettier-plugin-organize-imports` 2. `npm run format` 3. `npm audit fix` It partially addresses [issue](llvm#151598) and improves quality of ts code (formatting and unused imports).

…lvm#169445) In llvm#168534 we made the `TypePrinter` re-use `printNestedNameSpecifier` for printing scopes. However, the way that the names of anonymous/unnamed types get printed by the two are slightly inconsistent with each other. `printNestedNameSpecifier` calls all `TagType`s without an identifer `(anonymous)`. On the other hand, `TypePrinter` prints them slightly more accurate (it differentiates anonymous vs. unnamed decls) and allows for some additional customization points. E.g., with `MSVCFormatting`, it will print `` `unnamed struct'`` instead of `(unnamed struct)`. `printNestedNameSpecifier` already accounts for `MSVCFormatting` for namespaces, but doesn't for `TagType`s. This inconsistency means that if an unnamed tag is printed as part of a scope then it's displayed as `(anonymous struct)`, but if it's the entity whose scope is being printed, then it shows as `(unnamed struct)`. This patch moves the printing of anonymous/unnamed tags into `TagDecl::printName`. All the callsites that previously printed anonymous tag decls now call `printName` to handle it. To preserve the behaviour of not printing the kind name (i.e., `struct`/`class`/`enum`) when printing the inner type of an elaborated type (i.e., avoiding `struct (unnamed struct)`), this patch adds a `PrintingPolicy::SuppressTagKeywordInAnonNames` that is appropriately set when we want to suppress the tag keyword inside the anonymous name. I had to make sure we set this bit to `false` when printing nested-name-specifiers because we always want the tag keyword there (e.g., `foo::(anonymous struct)::bar`) and for a `clangd` special case which is described in a comment in the source. **Test changes** Mostly we now more accurately print the kind name of anonymous entities. So there's a lot of `anonymous` -> `unnamed` changes. There are a handful of `clangd` tests where the name of the entity is now `(unnamed struct)` instead of just `(unnamed)`. That should be consistent with how we choose to omit the tag keyword elsewhere. Since we're just printing the name of the entity here, we include the kind tag.

…m#174527) For dl `__builtin_amdgcn_fdot2` builtins, using 'x' in the def so that it will take _Float16 for HIP/C++ and half for OpenCL.

Make llvm-exegesis more usable on AArch64 by doing the following: Add some missing exegesis handling of register classes; Add some missing LLVM AArch64 OperandTypes. Llvm-exegesis can now handle many more AArch64 instructions. AArch64 load/store instructions are not yet supported by llvm-exegesis, until llvm#144895 lands. --------- Co-authored-by: Cullen Rhodes <cullen.rhodes@arm.com>

This change allows using clang's `-ffat-lto-objects` flag with COFF targets such as `i386-pc-win32`. Follow-up to 759fb0a from which it was split off. The added tests are adapted from the pre-existing ELF tests.

…les (llvm#172042) [llvm#172040](llvm#172040) This patch implements the scripts for generating the lookup tables and associated utils for wctype classification functions. Not all Unicode properties are covered as not all need a lookup table, the rest will be hardcoded. The size of the generated tables is 47,8KB.

…m supported version (llvm#172664)

Some of the MIR test hit a bug where it errors if there is a raw global reference as the referenced value. Worked around some of those by just keeping a no-op bitcast constant expression.

…lvm#174569) Reapply the zero handling, reverted in 108a22e The failing libc test should have been fixed by e25eacf

This patch extends the X86CompressEVEX pass to recognize and compress multi-instruction masking patterns to MOVMSK instructions. Fixes llvm#171746

…Int::get() (llvm#171456)" This reverts commit d189b49. Still causes assertion failures on some buildbots.

…m#174404) This lets us properly annotate ranges for gpu.cluster_block_id and gpu.cluster_dim_blocks. It also allows us to fill in the nvvm.cluster_dim attribute for use in the NVVM backend.

)

…element masks (llvm#174570) We can't convert these to CONCAT_VECTORS/KUNPCK, but we might be able to concat the operands directly.

z1-cciauto · 2026-01-06T12:08:10Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/3467

nikic and others added 30 commits January 6, 2026 09:11

[ItaniumCXXABI] Use getSigned() for offset hint (llvm#174431)

a87af1d

The offset hint can be negative, so we should use getSigned() here. This avoids an assertion failure with llvm#171456. Extend the dynamic_cast tests to include a 32-bit target to cover this case.

InstCombine: Add baseline tests for fmul SimplifyDemandedFPClass hand…

4af552d

…ling (llvm#173871)

[AMDGPU] Removing unwanted delta in llc-pipeline-npm.ll. (llvm#174535)

51bb529

[VPlan] Simplify pow-of-2 (mul|udiv) -> (shl|lshr) (llvm#172477)

a2db31b

[RISCV] Don't generate QC.SWMI pair if the start reg is X0 (llvm#174532)

8a8447f

The register containing the values stored by `QC.SWMI` need to be `GPRNoX0`.

[lldb-dap] Add skipIfWindowsWithoutConPTY decorator for clearer test …

ff20747

…requirements (llvm#174509) Replace version matching with the new decorator to prevent typos, and make to it clear why we skipped the test.

[X86] combineOr - convert `OR(X,KSHIFTL(Y,Elts/2)) -> CONCAT_VECTORS(…

004c420

…X,Y)` fold to SDPatternMatch. NFC. (llvm#174554) Merge the pair of commuted patterns.

[AArch64] Add intrinsics for 9.6 crypto instructions (llvm#165545)

ef86355

This patch add intrinsics for crpyto instructions defined in ARM-software/acle#411 ACLE proposal

[lldb][NFC] use parenthesis constructor (llvm#174556)

371903c

Follow the LLVM coding standard

[AMDGPU] Modifies fdot2 builtin def to take _Float16 for HIP/C++ (llv…

ba1b867

…m#174527) For dl `__builtin_amdgcn_fdot2` builtins, using 'x' in the def so that it will take _Float16 for HIP/C++ and half for OpenCL.

[clang] Enable fat-lto-object support for COFF targets (llvm#172936)

2bfb984

This change allows using clang's `-ffat-lto-objects` flag with COFF targets such as `i386-pc-win32`. Follow-up to 759fb0a from which it was split off. The added tests are adapted from the pre-existing ELF tests.

[libc++] Warn when users request a deployment target below the minimu…

25becc3

…m supported version (llvm#172664)

llvm: Convert some assorted lit tests to opaque pointers (llvm#174564)

56ce7ed

Some of the MIR test hit a bug where it errors if there is a raw global reference as the referenced value. Worked around some of those by just keeping a no-op bitcast constant expression.

ValueTracking: Reapply remainder of fadd handling from llvm#174290 (l…

3d59a4d

…lvm#174569) Reapply the zero handling, reverted in 108a22e The failing libc test should have been fixed by e25eacf

AMDGPU: clang-format AMDGPULowerKernelAttributes (llvm#174567)

b8f5cbb

[X86] Allow EVEX compression for mask registers (llvm#171980)

1caf270

This patch extends the X86CompressEVEX pass to recognize and compress multi-instruction masking patterns to MOVMSK instructions. Fixes llvm#171746

Revert "Reapply [ConstantInt] Disable implicit truncation in Constant…

0e789a8

…Int::get() (llvm#171456)" This reverts commit d189b49. Still causes assertion failures on some buildbots.

[MLIR] Propagate known cluster sizes from gpu.launch to gpu.func (llv…

9a93769

…m#174404) This lets us properly annotate ranges for gpu.cluster_block_id and gpu.cluster_dim_blocks. It also allows us to fill in the nvvm.cluster_dim attribute for use in the NVVM backend.

arsenm and others added 4 commits January 6, 2026 12:50

unittests: Convert some IR in unit tests to opaque pointers (llvm#174562

047aaa5

)

[X86] combineOr - attempt to concat OR(X,KSHIFTL(Y,Elts/2)) sub-16-…

6f01dea

…element masks (llvm#174570) We can't convert these to CONCAT_VECTORS/KUNPCK, but we might be able to concat the operands directly.

SPIRV: Convert tests to opaque pointers (llvm#174563)

1db53f1

merge main into amd-staging

5775672

z1-cciauto requested a review from fabianmcg as a code owner January 6, 2026 12:06

z1-cciauto requested a review from a team January 6, 2026 12:06

ronlieb approved these changes Jan 6, 2026

View reviewed changes

z1-cciauto merged commit 57baf6b into amd-staging Jan 6, 2026
14 checks passed

z1-cciauto deleted the upstream_merge_202601060706 branch January 6, 2026 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #1011

merge main into amd-staging #1011

Uh oh!

z1-cciauto commented Jan 6, 2026

Uh oh!

z1-cciauto commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

merge main into amd-staging #1011

merge main into amd-staging #1011

Uh oh!

Conversation

z1-cciauto commented Jan 6, 2026

Uh oh!

z1-cciauto commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants