forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 77
merge main into amd-staging #816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In order to allow arm64 code to run on MTE environments, we need to make the compiler only assume the top 4 bits can be ignored as MTE occupies the lower 4. rdar://164645323
…1236) This commit leaves "b" aliased to the old _regexp-break for now. The two variants are identical except that `_regexp-break` allows you to say: `(lldb) b <unrecognized_input> ` which gets translated to: `break set <unrecognized_input> ` So switching people to `_regexp-break-add` would be a surprising behavior change. It would be wrong for `_regexp_break-add` have one branch that call `break set`, so to avoid surprise, I'll add the command and let people who are playing with `break add` instead of `break set` can set the alias to the new one by hand for now.
Having duplicate mode entries previously asserted (or silently replaced the last value with a new one in release builds). Report an error with a helpful message instead. Pull Request: llvm#171715
I have a change to validate the operand classes emitted in the AsmParser and that caused llvm/test/MC/RISCV/rv32p-valid.s to fail due to the rd_wb register using a different register class from rd: `PWADDA_H operand 1 register X6 is not a member of register class GPRPair` This happens because tablegen's AsmMatcherEmitter emits code to literally copy over the tied registers and does not feed them through the equivalent of RISCVAsmParser::validateTargetOperandClass() which would allow adjusting these operand classes. Ideally we would handle this in tablegen (or at least add an error), but the tied operand handling logic is rather complex and I don't understand it yet. For now just update the rd register class to match rd_wb. Pull Request: llvm#171738
…TINS_DIR to COMPILER_RT_TEST_BUILTINS_DIR (llvm#171741) Co-authored-by: David Tenty <daltenty@ibm.com>
llvm#171745) … instrs. (llvm#169779)" This reverts commit 2b958b9. I might have broken the sanitizer-x86_64-linux bot /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_procmaps_linux.cpp clang++: /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:248: const T &llvm::ArrayRef<llvm::DbgValueLocEntry>::operator[](size_t) const [T = llvm::DbgValueLocEntry]: Assertion `Index < Length && "Invalid index!"' failed.
…llvm#171721) This enables MachineVerifier and MachineIR printing support for these operands.
…em under issue 171751
Add support for allocate statement with a source that is a device variable.
Co-authored-by: Jérôme Duval <jerome.duval@gmail.com>
…pile commands (llvm#169640) This patch fixes an issue in progress reporting where the processed item counter could exceed the total item count, leading to confusing outputs like [22/18]. Closes [llvm#169168](llvm#169168)
Previously, `isOSGlibc()` was returning true for musl triples as well. This commit changes `isOSGlibc()` to return false for musl triples, and updates all existing `isOSGlibc()` checks to call `isOSGlibc() || isMusl()`, in order to preserve existing behaviour.
…inMax. NFC (llvm#171736) This operand is always a register.
…lvm#171747) Replace dyn_cast with cast. The dyn_cast can never fail now. Previously it never succeeded.
This implements WG14 N3734 (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3734.pdf), aka `_Defer`; it is currently only supported in C if `-fdefer-ts` is passed.
This patch adds support for the `ud ui5` macro instruction. The `ui5` operand must be inthe range `0-31`. The macro expands to: `amswap.w $rd, $r1, $rj` where `ui5` specifies the register number used for `$rd` in the expanded instruction, and `$rd` is the same as `$rj`. Relevant binutils patch: https://sourceware.org/pipermail/binutils/2025-December/146042.html
…lvm#171079) This patch adds support for generating the Xqcilsm load/store multiple instructions as a part of the RISCVLoadStoreOptimizer pass. For now we only combine two load/store instructions into a load/store multiple. Support for converting more loads/stores will be added in follow-up patches. These instructions are only applicable for 32-bit loads/stores with an alignment of 4-bytes.
…71568) Changed the range computation in computeOverflowForUnsignedMul to use computeConstantRange as well. This expands the patterns that InstCombine manages to narrow a mul that has values that come from zext, for example if a value comes from a div operation then the known bits doesn't give the narrowest possible range for that value. --------- Co-authored-by: Adar Dagan <adar.dagan@mobileye.com>
…#171643) Previously this only happened for constants of some types and missed incorrect ptrtoaddr.
llvm#162653) This folds `icmp (ptrtoaddr x, ptrtoaddr y)` to `icmp (x, y)`, matching the existing ptrtoint fold. Restrict both folds to only the case where the result type matches the address type. I think that all folds this can do in practice end up actually being valid for ptrtoint to a type large than the address size as well, but I don't really see a way to justify this generically without making assumptions about what kind of folding the recursive calls may do. This is based on the icmp semantics specified in llvm#163936.
-- This commit is the fourth in the series of adding matchers for linalg.*conv*/*pool*. Refer: llvm#163724 -- In this commit all variants of Conv2D convolution ops have been added. -- It also refactors the way these matchers work to make adding more matchers concise. Signed-off-by: Abhishek Varma <abhvarma@amd.com> --------- Signed-off-by: Abhishek Varma <abhvarma@amd.com> Signed-off-by: hanhanW <hanhan0912@gmail.com> Co-authored-by: hanhanW <hanhan0912@gmail.com>
…ames. NFC. (llvm#171645) Both `decomposeBitTestICmp` and `decomposeBitTest` have a parameter called `lookThroughTrunc`. This was spelled in full (i.e. `lookThroughTrunc`) in the header. However, in the implementation, it's written as `lookThruTrunc`. I opted to convert all instances of `lookThruTrunc` into `lookThroughTrunc` to reduce surprise while reading the code and for conformity. --- The other change in this PR is the renaming of the wrapper around `decomposeBitTest()`. Even though it was a wrapper around `CmpInstAnalysis.h`'s `decomposeBitTest`, the function was called `decomposeBitTestICmp`. This is quite confusing because such a function _also_ exists in `CmpInstAnalysis.h`, but it is _not_ the one actually being used in `InstCombineAndOrXor.cpp`.
Add `f64:32:64` to the data layout for AIX, to indicate that doubles have a 32-bit ABI alignment and 64-bit preferred alignment. Clang was already taking this into account, but it was not reflected in LLVM's data layout. A notable effect of this change is that `double` loads/stores with 4 byte alignment are no longer considered "unaligned" and avoid the corresponding unaligned access legalization. I assume that this is correct/desired for AIX. (The codegen previously already relied on this in some places related to the call ABI simply by dint of assuming certain stack locations were 8 byte aligned, even though they were only actually 4 byte aligned.) Fixes llvm#133599.
…171072) This patch try to move all vl patterns and sd node patterns to RISCVInstrInfoVVLPatterns.td and RISCVInstrInfoVSDPatterns.td respectively. It removes redefinition of pattern classes for zvfbfa and make it easier to maintain and change. Note: this does not include intrinsic patterns, if we want to also unify intrinsic patterns we need to also move pseudo instruction definitions of zvfbfa to RISCVInstrInfoVPseudos.td.
…teExtInst` instead of `SPIRVRegularizer` (llvm#170155) This patch consist of 2 parts: * A first part that removes the scalar to vector promotion for built-ins in the `SPIRVRegularizer`; * and a second part that implements the promotion for built-ins from scalar to vector in `generateExtInst`. The implementation in `SPIRVRegularizer` had several issues: * It rolled its own built-in pattern matching that was extremely permissive * the compiler would crash if the built-in had a definition * the compiler would crash if the built-in had no arguments * The compiler would crash if there were more than 2 function definitions in the module. * It'd be better if this was implemented as a module pass; where we iterate over the users of the function, instead of scanning the whole module for callers. This patch does the scalar to vector promotion just before the `OpExtInst` is generated. Without relying on the IR transformation. One change in the generated code from the previous implementation is that this version uses a single `OpCompositeConstruct` operation to convert the scalar into a vector. The old implementation inserted an element at the 0 position in an `undef` vector (using `OpCompositeInsert`); then copied that element for every vector element using `OpVectorShuffle`. This patch also adds a test (`OpExtInst_vector_promotion_bug.ll`) that highlights an issue in the builtin pattern matching that we're using: our pattern matching doesn't consider the number of arguments, only the demangled name, first and last arguments (`min(int,int,int)` matches the same builtin as `min(int, int)`).
Before this patch, `insertelement/extractelement` with dynamic indices would fail to select with `-O0` for vector 32-bit element types with sizes 3, 5, 6 and 7, which did not map to a `SI_INDIRECT_SRC/DST` pattern. Other "weird" sizes bigger than 8 (like 13) are properly handled already. To solve this issue we add the missing patterns for the problematic sizes. Solves SWDEV-568862
…llvm#171651) Allocators should be extremely cheap, if not free, to copy. Furthermore, we have requirements on allocator types that copies must compare equal, and that move and copy must be the same. Hence, taking an allocator by reference should not provide benefits beyond making a copy of it. However, taking the allocator by reference leads to complexity in __split_buffer, which can be removed if we stop using that pattern.
Some [ideas for improvement](llvm#169858 (review)) came up during review of recent changes to `isTRNMask`. This PR applies them also to `isZIPMask`, which is implemented almost identically.
This essentially reverts llvm#100685 and fixes the bidirectional and random access specializations to be actually used. ``` Benchmark old new Difference % Difference ------------------------------------------------------------ -------------- -------------- ------------ -------------- rng::find_end(deque<int>)_(match_near_end)/1000 366.91 47.63 -319.28 -87.02% rng::find_end(deque<int>)_(match_near_end)/1024 3273.31 35.42 -3237.89 -98.92% rng::find_end(deque<int>)_(match_near_end)/8192 171608.41 285.04 -171323.38 -99.83% rng::find_end(deque<int>)_(near_matches)/1000 31808.40 19214.35 -12594.05 -39.59% rng::find_end(deque<int>)_(near_matches)/1024 37428.72 20773.87 -16654.85 -44.50% rng::find_end(deque<int>)_(near_matches)/8192 1719468.34 1213967.45 -505500.89 -29.40% rng::find_end(deque<int>)_(process_all)/1000 275.81 336.29 60.49 21.93% rng::find_end(deque<int>)_(process_all)/1024 258.88 320.36 61.47 23.74% rng::find_end(deque<int>)_(process_all)/1048576 277117.41 327640.37 50522.96 18.23% rng::find_end(deque<int>)_(process_all)/8192 2166.36 2533.52 367.16 16.95% rng::find_end(deque<int>)_(same_length)/1000 1280.06 362.53 -917.53 -71.68% rng::find_end(deque<int>)_(same_length)/1024 1419.99 417.58 -1002.40 -70.59% rng::find_end(deque<int>)_(same_length)/8192 11363.81 2870.63 -8493.18 -74.74% rng::find_end(deque<int>)_(single_element)/1000 277.22 363.52 86.31 31.13% rng::find_end(deque<int>)_(single_element)/1024 257.11 353.94 96.84 37.66% rng::find_end(deque<int>)_(single_element)/8192 2059.02 2762.29 703.27 34.16% rng::find_end(deque<int>,_pred)_(match_near_end)/1000 696.84 70.07 -626.77 -89.94% rng::find_end(deque<int>,_pred)_(match_near_end)/1024 4774.82 70.75 -4704.07 -98.52% rng::find_end(deque<int>,_pred)_(match_near_end)/8192 267492.37 549.57 -266942.81 -99.79% rng::find_end(deque<int>,_pred)_(near_matches)/1000 39414.88 31070.43 -8344.46 -21.17% rng::find_end(deque<int>,_pred)_(near_matches)/1024 38168.52 32362.18 -5806.34 -15.21% rng::find_end(deque<int>,_pred)_(near_matches)/8192 2594717.16 1938056.79 -656660.38 -25.31% rng::find_end(deque<int>,_pred)_(process_all)/1000 600.88 586.92 -13.96 -2.32% rng::find_end(deque<int>,_pred)_(process_all)/1024 613.00 592.66 -20.33 -3.32% rng::find_end(deque<int>,_pred)_(process_all)/1048576 600059.65 603440.98 3381.33 0.56% rng::find_end(deque<int>,_pred)_(process_all)/8192 4850.32 4764.56 -85.76 -1.77% rng::find_end(deque<int>,_pred)_(same_length)/1000 1514.90 700.34 -814.57 -53.77% rng::find_end(deque<int>,_pred)_(same_length)/1024 1561.14 705.80 -855.34 -54.79% rng::find_end(deque<int>,_pred)_(same_length)/8192 12544.84 5024.45 -7520.39 -59.95% rng::find_end(deque<int>,_pred)_(single_element)/1000 603.79 650.63 46.84 7.76% rng::find_end(deque<int>,_pred)_(single_element)/1024 614.93 656.43 41.50 6.75% rng::find_end(deque<int>,_pred)_(single_element)/8192 4885.89 5225.71 339.82 6.96% rng::find_end(forward_list<int>)_(match_near_end)/1000 770.05 769.32 -0.73 -0.09% rng::find_end(forward_list<int>)_(match_near_end)/1024 4833.13 4733.24 -99.90 -2.07% rng::find_end(forward_list<int>)_(match_near_end)/8192 259324.32 261066.84 1742.52 0.67% rng::find_end(forward_list<int>)_(near_matches)/1000 38301.11 38608.61 307.50 0.80% rng::find_end(forward_list<int>)_(near_matches)/1024 39370.54 39878.59 508.05 1.29% rng::find_end(forward_list<int>)_(near_matches)/8192 2527338.50 2527722.47 383.97 0.02% rng::find_end(forward_list<int>)_(process_all)/1000 713.63 720.74 7.11 1.00% rng::find_end(forward_list<int>)_(process_all)/1024 727.81 731.60 3.79 0.52% rng::find_end(forward_list<int>)_(process_all)/1048576 757728.47 766470.14 8741.67 1.15% rng::find_end(forward_list<int>)_(process_all)/8192 5821.05 5817.80 -3.25 -0.06% rng::find_end(forward_list<int>)_(same_length)/1000 1458.99 1454.50 -4.49 -0.31% rng::find_end(forward_list<int>)_(same_length)/1024 1507.73 1515.78 8.05 0.53% rng::find_end(forward_list<int>)_(same_length)/8192 20432.32 18658.93 -1773.39 -8.68% rng::find_end(forward_list<int>)_(single_element)/1000 712.41 708.41 -4.00 -0.56% rng::find_end(forward_list<int>)_(single_element)/1024 728.05 728.78 0.73 0.10% rng::find_end(forward_list<int>)_(single_element)/8192 5795.48 6332.88 537.40 9.27% rng::find_end(forward_list<int>,_pred)_(match_near_end)/1000 843.67 846.77 3.10 0.37% rng::find_end(forward_list<int>,_pred)_(match_near_end)/1024 5267.90 5343.84 75.94 1.44% rng::find_end(forward_list<int>,_pred)_(match_near_end)/8192 280912.75 286141.10 5228.35 1.86% rng::find_end(forward_list<int>,_pred)_(near_matches)/1000 43386.35 44489.38 1103.03 2.54% rng::find_end(forward_list<int>,_pred)_(near_matches)/1024 44929.84 45608.55 678.71 1.51% rng::find_end(forward_list<int>,_pred)_(near_matches)/8192 2723281.29 2765369.43 42088.14 1.55% rng::find_end(forward_list<int>,_pred)_(process_all)/1000 763.13 763.85 0.72 0.09% rng::find_end(forward_list<int>,_pred)_(process_all)/1024 796.98 773.40 -23.58 -2.96% rng::find_end(forward_list<int>,_pred)_(process_all)/1048576 858071.76 846166.06 -11905.69 -1.39% rng::find_end(forward_list<int>,_pred)_(process_all)/8192 6282.19 6244.95 -37.24 -0.59% rng::find_end(forward_list<int>,_pred)_(same_length)/1000 1560.18 1583.03 22.86 1.47% rng::find_end(forward_list<int>,_pred)_(same_length)/1024 1603.94 1612.22 8.28 0.52% rng::find_end(forward_list<int>,_pred)_(same_length)/8192 16907.98 15638.35 -1269.63 -7.51% rng::find_end(forward_list<int>,_pred)_(single_element)/1000 746.72 754.08 7.36 0.99% rng::find_end(forward_list<int>,_pred)_(single_element)/1024 761.27 771.75 10.48 1.38% rng::find_end(forward_list<int>,_pred)_(single_element)/8192 6166.83 6687.87 521.04 8.45% rng::find_end(list<int>)_(match_near_end)/1000 793.99 67.06 -726.93 -91.55% rng::find_end(list<int>)_(match_near_end)/1024 4682.12 79.82 -4602.31 -98.30% rng::find_end(list<int>)_(match_near_end)/8192 263187.10 582.64 -262604.46 -99.78% rng::find_end(list<int>)_(near_matches)/1000 38066.70 34687.59 -3379.11 -8.88% rng::find_end(list<int>)_(near_matches)/1024 39721.77 36150.04 -3571.73 -8.99% rng::find_end(list<int>)_(near_matches)/8192 2543369.85 2247297.03 -296072.82 -11.64% rng::find_end(list<int>)_(process_all)/1000 716.89 726.65 9.76 1.36% rng::find_end(list<int>)_(process_all)/1024 742.41 744.05 1.64 0.22% rng::find_end(list<int>)_(process_all)/1048576 822449.08 873801.46 51352.38 6.24% rng::find_end(list<int>)_(process_all)/8192 7704.49 9766.50 2062.02 26.76% rng::find_end(list<int>)_(same_length)/1000 1508.19 710.90 -797.28 -52.86% rng::find_end(list<int>)_(same_length)/1024 1540.23 735.35 -804.88 -52.26% rng::find_end(list<int>)_(same_length)/8192 22786.44 10752.45 -12033.98 -52.81% rng::find_end(list<int>)_(single_element)/1000 699.16 734.76 35.60 5.09% rng::find_end(list<int>)_(single_element)/1024 717.09 750.91 33.82 4.72% rng::find_end(list<int>)_(single_element)/8192 9502.45 10289.21 786.76 8.28% rng::find_end(list<int>,_pred)_(match_near_end)/1000 841.98 83.86 -758.12 -90.04% rng::find_end(list<int>,_pred)_(match_near_end)/1024 5463.71 76.95 -5386.76 -98.59% rng::find_end(list<int>,_pred)_(match_near_end)/8192 287070.76 647.14 -286423.62 -99.77% rng::find_end(list<int>,_pred)_(near_matches)/1000 43878.61 38899.00 -4979.61 -11.35% rng::find_end(list<int>,_pred)_(near_matches)/1024 45672.50 40520.68 -5151.82 -11.28% rng::find_end(list<int>,_pred)_(near_matches)/8192 2764800.76 2495879.89 -268920.87 -9.73% rng::find_end(list<int>,_pred)_(process_all)/1000 764.46 774.78 10.32 1.35% rng::find_end(list<int>,_pred)_(process_all)/1024 786.81 793.05 6.24 0.79% rng::find_end(list<int>,_pred)_(process_all)/1048576 934166.34 954637.60 20471.26 2.19% rng::find_end(list<int>,_pred)_(process_all)/8192 9509.24 10209.73 700.49 7.37% rng::find_end(list<int>,_pred)_(same_length)/1000 1545.67 782.96 -762.71 -49.34% rng::find_end(list<int>,_pred)_(same_length)/1024 1580.94 796.87 -784.08 -49.60% rng::find_end(list<int>,_pred)_(same_length)/8192 21558.41 13370.92 -8187.49 -37.98% rng::find_end(list<int>,_pred)_(single_element)/1000 766.49 762.81 -3.68 -0.48% rng::find_end(list<int>,_pred)_(single_element)/1024 784.75 781.47 -3.28 -0.42% rng::find_end(list<int>,_pred)_(single_element)/8192 9722.26 10399.11 676.85 6.96% rng::find_end(vector<int>)_(match_near_end)/1000 267.82 25.34 -242.48 -90.54% rng::find_end(vector<int>)_(match_near_end)/1024 2259.46 25.78 -2233.68 -98.86% rng::find_end(vector<int>)_(match_near_end)/8192 119747.92 214.53 -119533.39 -99.82% rng::find_end(vector<int>)_(near_matches)/1000 16913.73 14102.20 -2811.53 -16.62% rng::find_end(vector<int>)_(near_matches)/1024 16097.97 14767.26 -1330.71 -8.27% rng::find_end(vector<int>)_(near_matches)/8192 1102803.07 823463.30 -279339.78 -25.33% rng::find_end(vector<int>)_(process_all)/1000 233.43 380.28 146.85 62.91% rng::find_end(vector<int>)_(process_all)/1024 238.86 389.32 150.46 62.99% rng::find_end(vector<int>)_(process_all)/1048576 269619.36 391698.75 122079.39 45.28% rng::find_end(vector<int>)_(process_all)/8192 2011.46 3061.40 1049.94 52.20% rng::find_end(vector<int>)_(same_length)/1000 632.19 253.50 -378.69 -59.90% rng::find_end(vector<int>)_(same_length)/1024 556.53 254.87 -301.66 -54.20% rng::find_end(vector<int>)_(same_length)/8192 4597.26 2095.57 -2501.68 -54.42% rng::find_end(vector<int>)_(single_element)/1000 231.57 417.64 186.06 80.35% rng::find_end(vector<int>)_(single_element)/1024 236.41 427.03 190.62 80.63% rng::find_end(vector<int>)_(single_element)/8192 1918.95 3367.29 1448.33 75.48% rng::find_end(vector<int>,_pred)_(match_near_end)/1000 581.49 52.67 -528.82 -90.94% rng::find_end(vector<int>,_pred)_(match_near_end)/1024 3545.40 53.74 -3491.65 -98.48% rng::find_end(vector<int>,_pred)_(match_near_end)/8192 190482.78 432.30 -190050.48 -99.77% rng::find_end(vector<int>,_pred)_(near_matches)/1000 28878.24 24723.01 -4155.23 -14.39% rng::find_end(vector<int>,_pred)_(near_matches)/1024 30035.85 25597.45 -4438.40 -14.78% rng::find_end(vector<int>,_pred)_(near_matches)/8192 1858596.45 1584796.11 -273800.34 -14.73% rng::find_end(vector<int>,_pred)_(process_all)/1000 518.92 813.46 294.53 56.76% rng::find_end(vector<int>,_pred)_(process_all)/1024 531.17 710.20 179.03 33.70% rng::find_end(vector<int>,_pred)_(process_all)/1048576 674064.13 905070.15 231006.01 34.27% rng::find_end(vector<int>,_pred)_(process_all)/8192 4254.34 6372.76 2118.43 49.79% rng::find_end(vector<int>,_pred)_(same_length)/1000 1106.96 526.23 -580.73 -52.46% rng::find_end(vector<int>,_pred)_(same_length)/1024 1133.60 539.70 -593.90 -52.39% rng::find_end(vector<int>,_pred)_(same_length)/8192 8988.10 4302.83 -4685.27 -52.13% rng::find_end(vector<int>,_pred)_(single_element)/1000 528.11 523.69 -4.42 -0.84% rng::find_end(vector<int>,_pred)_(single_element)/1024 539.58 838.49 298.91 55.40% rng::find_end(vector<int>,_pred)_(single_element)/8192 4301.43 7313.22 3011.79 70.02% std::find_end(deque<int>)_(match_near_end)/1000 347.82 38.56 -309.26 -88.91% std::find_end(deque<int>)_(match_near_end)/1024 3340.80 34.54 -3306.27 -98.97% std::find_end(deque<int>)_(match_near_end)/8192 171599.83 281.87 -171317.96 -99.84% std::find_end(deque<int>)_(near_matches)/1000 29703.68 19712.27 -9991.41 -33.64% std::find_end(deque<int>)_(near_matches)/1024 32312.41 20008.21 -12304.20 -38.08% std::find_end(deque<int>)_(near_matches)/8192 1851286.99 1216112.34 -635174.65 -34.31% std::find_end(deque<int>)_(process_all)/1000 256.69 315.96 59.27 23.09% std::find_end(deque<int>)_(process_all)/1024 260.97 305.42 44.45 17.03% std::find_end(deque<int>)_(process_all)/1048576 273310.08 309499.13 36189.05 13.24% std::find_end(deque<int>)_(process_all)/8192 2071.33 2606.57 535.25 25.84% std::find_end(deque<int>)_(same_length)/1000 1422.58 441.07 -981.51 -68.99% std::find_end(deque<int>)_(same_length)/1024 1844.27 350.75 -1493.52 -80.98% std::find_end(deque<int>)_(same_length)/8192 14681.69 2839.26 -11842.43 -80.66% std::find_end(deque<int>)_(single_element)/1000 291.63 344.82 53.19 18.24% std::find_end(deque<int>)_(single_element)/1024 257.97 330.19 72.21 27.99% std::find_end(deque<int>)_(single_element)/8192 2220.10 2505.02 284.92 12.83% std::find_end(deque<int>,_pred)_(match_near_end)/1000 694.70 69.60 -625.11 -89.98% std::find_end(deque<int>,_pred)_(match_near_end)/1024 4735.45 71.12 -4664.33 -98.50% std::find_end(deque<int>,_pred)_(match_near_end)/8192 267417.02 561.03 -266855.99 -99.79% std::find_end(deque<int>,_pred)_(near_matches)/1000 42199.71 31597.49 -10602.22 -25.12% std::find_end(deque<int>,_pred)_(near_matches)/1024 38007.49 32362.16 -5645.33 -14.85% std::find_end(deque<int>,_pred)_(near_matches)/8192 2607708.49 1935799.88 -671908.60 -25.77% std::find_end(deque<int>,_pred)_(process_all)/1000 599.65 552.71 -46.94 -7.83% std::find_end(deque<int>,_pred)_(process_all)/1024 615.88 554.17 -61.71 -10.02% std::find_end(deque<int>,_pred)_(process_all)/1048576 598471.63 599441.79 970.16 0.16% std::find_end(deque<int>,_pred)_(process_all)/8192 4853.45 4394.20 -459.25 -9.46% std::find_end(deque<int>,_pred)_(same_length)/1000 1511.68 797.64 -714.04 -47.23% std::find_end(deque<int>,_pred)_(same_length)/1024 1568.63 810.85 -757.78 -48.31% std::find_end(deque<int>,_pred)_(same_length)/8192 12609.34 5092.02 -7517.32 -59.62% std::find_end(deque<int>,_pred)_(single_element)/1000 601.22 628.80 27.58 4.59% std::find_end(deque<int>,_pred)_(single_element)/1024 613.25 627.15 13.89 2.27% std::find_end(deque<int>,_pred)_(single_element)/8192 4823.85 4795.25 -28.60 -0.59% std::find_end(forward_list<int>)_(match_near_end)/1000 762.64 769.74 7.10 0.93% std::find_end(forward_list<int>)_(match_near_end)/1024 4767.93 4840.87 72.94 1.53% std::find_end(forward_list<int>)_(match_near_end)/8192 260275.68 260835.21 559.53 0.21% std::find_end(forward_list<int>)_(near_matches)/1000 38020.76 38197.53 176.77 0.46% std::find_end(forward_list<int>)_(near_matches)/1024 39028.86 39333.38 304.51 0.78% std::find_end(forward_list<int>)_(near_matches)/8192 2524921.48 2523470.32 -1451.16 -0.06% std::find_end(forward_list<int>)_(process_all)/1000 699.95 699.93 -0.02 -0.00% std::find_end(forward_list<int>)_(process_all)/1024 715.24 712.07 -3.17 -0.44% std::find_end(forward_list<int>)_(process_all)/1048576 755926.33 756976.31 1049.98 0.14% std::find_end(forward_list<int>)_(process_all)/8192 5696.72 5672.92 -23.81 -0.42% std::find_end(forward_list<int>)_(same_length)/1000 1485.84 1480.19 -5.65 -0.38% std::find_end(forward_list<int>)_(same_length)/1024 1493.62 1516.95 23.33 1.56% std::find_end(forward_list<int>)_(same_length)/8192 16833.75 13551.42 -3282.33 -19.50% std::find_end(forward_list<int>)_(single_element)/1000 688.87 675.02 -13.85 -2.01% std::find_end(forward_list<int>)_(single_element)/1024 688.89 691.59 2.69 0.39% std::find_end(forward_list<int>)_(single_element)/8192 5735.87 6748.85 1012.98 17.66% std::find_end(forward_list<int>,_pred)_(match_near_end)/1000 836.01 853.28 17.27 2.07% std::find_end(forward_list<int>,_pred)_(match_near_end)/1024 5259.92 5299.30 39.39 0.75% std::find_end(forward_list<int>,_pred)_(match_near_end)/8192 279479.85 285593.49 6113.65 2.19% std::find_end(forward_list<int>,_pred)_(near_matches)/1000 42577.60 44550.54 1972.94 4.63% std::find_end(forward_list<int>,_pred)_(near_matches)/1024 44374.19 45697.95 1323.76 2.98% std::find_end(forward_list<int>,_pred)_(near_matches)/8192 2711138.03 2742988.33 31850.30 1.17% std::find_end(forward_list<int>,_pred)_(process_all)/1000 752.03 762.75 10.72 1.43% std::find_end(forward_list<int>,_pred)_(process_all)/1024 767.04 781.48 14.44 1.88% std::find_end(forward_list<int>,_pred)_(process_all)/1048576 843453.35 861838.82 18385.47 2.18% std::find_end(forward_list<int>,_pred)_(process_all)/8192 6241.65 6308.05 66.40 1.06% std::find_end(forward_list<int>,_pred)_(same_length)/1000 2384.18 1589.21 -794.97 -33.34% std::find_end(forward_list<int>,_pred)_(same_length)/1024 2428.97 1617.17 -811.80 -33.42% std::find_end(forward_list<int>,_pred)_(same_length)/8192 16961.22 14972.86 -1988.36 -11.72% std::find_end(forward_list<int>,_pred)_(single_element)/1000 743.31 752.77 9.47 1.27% std::find_end(forward_list<int>,_pred)_(single_element)/1024 763.62 768.70 5.08 0.67% std::find_end(forward_list<int>,_pred)_(single_element)/8192 6189.73 6934.04 744.31 12.02% std::find_end(list<int>)_(match_near_end)/1000 773.76 76.41 -697.35 -90.12% std::find_end(list<int>)_(match_near_end)/1024 4715.36 69.09 -4646.27 -98.53% std::find_end(list<int>)_(match_near_end)/8192 264864.51 584.19 -264280.32 -99.78% std::find_end(list<int>)_(near_matches)/1000 37650.69 35233.45 -2417.24 -6.42% std::find_end(list<int>)_(near_matches)/1024 39239.25 36699.13 -2540.13 -6.47% std::find_end(list<int>)_(near_matches)/8192 2543446.71 2252625.27 -290821.44 -11.43% std::find_end(list<int>)_(process_all)/1000 718.00 724.59 6.59 0.92% std::find_end(list<int>)_(process_all)/1024 735.14 746.70 11.57 1.57% std::find_end(list<int>)_(process_all)/1048576 812620.48 869606.78 56986.30 7.01% std::find_end(list<int>)_(process_all)/8192 8217.98 8462.53 244.55 2.98% std::find_end(list<int>)_(same_length)/1000 1500.85 716.45 -784.39 -52.26% std::find_end(list<int>)_(same_length)/1024 1534.13 736.62 -797.51 -51.98% std::find_end(list<int>)_(same_length)/8192 20274.06 10621.82 -9652.24 -47.61% std::find_end(list<int>)_(single_element)/1000 717.05 725.64 8.60 1.20% std::find_end(list<int>)_(single_element)/1024 732.87 742.44 9.57 1.31% std::find_end(list<int>)_(single_element)/8192 9835.11 11896.39 2061.28 20.96% std::find_end(list<int>,_pred)_(match_near_end)/1000 845.46 75.09 -770.37 -91.12% std::find_end(list<int>,_pred)_(match_near_end)/1024 5301.60 77.14 -5224.46 -98.54% std::find_end(list<int>,_pred)_(match_near_end)/8192 281976.13 648.87 -281327.25 -99.77% std::find_end(list<int>,_pred)_(near_matches)/1000 44076.98 39576.32 -4500.67 -10.21% std::find_end(list<int>,_pred)_(near_matches)/1024 45531.64 41020.11 -4511.54 -9.91% std::find_end(list<int>,_pred)_(near_matches)/8192 2756383.66 2503085.29 -253298.37 -9.19% std::find_end(list<int>,_pred)_(process_all)/1000 766.06 764.48 -1.58 -0.21% std::find_end(list<int>,_pred)_(process_all)/1024 780.35 799.51 19.15 2.45% std::find_end(list<int>,_pred)_(process_all)/1048576 894643.71 898947.94 4304.24 0.48% std::find_end(list<int>,_pred)_(process_all)/8192 8436.41 9977.74 1541.33 18.27% std::find_end(list<int>,_pred)_(same_length)/1000 1545.22 784.29 -760.92 -49.24% std::find_end(list<int>,_pred)_(same_length)/1024 1583.27 808.52 -774.74 -48.93% std::find_end(list<int>,_pred)_(same_length)/8192 21850.99 10896.50 -10954.48 -50.13% std::find_end(list<int>,_pred)_(single_element)/1000 752.03 755.00 2.97 0.39% std::find_end(list<int>,_pred)_(single_element)/1024 774.22 784.14 9.92 1.28% std::find_end(list<int>,_pred)_(single_element)/8192 10219.43 10396.49 177.05 1.73% std::find_end(vector<int>)_(match_near_end)/1000 277.37 28.45 -248.91 -89.74% std::find_end(vector<int>)_(match_near_end)/1024 2247.56 25.80 -2221.76 -98.85% std::find_end(vector<int>)_(match_near_end)/8192 119785.10 212.44 -119572.66 -99.82% std::find_end(vector<int>)_(near_matches)/1000 16351.34 14073.13 -2278.21 -13.93% std::find_end(vector<int>)_(near_matches)/1024 16656.33 14654.36 -2001.97 -12.02% std::find_end(vector<int>)_(near_matches)/8192 1181392.88 828918.96 -352473.91 -29.84% std::find_end(vector<int>)_(process_all)/1000 231.14 235.80 4.66 2.01% std::find_end(vector<int>)_(process_all)/1024 235.87 232.06 -3.81 -1.61% std::find_end(vector<int>)_(process_all)/1048576 239922.25 238229.38 -1692.87 -0.71% std::find_end(vector<int>)_(process_all)/8192 1837.43 1802.25 -35.19 -1.91% std::find_end(vector<int>)_(same_length)/1000 632.59 252.80 -379.79 -60.04% std::find_end(vector<int>)_(same_length)/1024 524.51 257.58 -266.94 -50.89% std::find_end(vector<int>)_(same_length)/8192 5159.01 2090.12 -3068.89 -59.49% std::find_end(vector<int>)_(single_element)/1000 229.56 250.47 20.91 9.11% std::find_end(vector<int>)_(single_element)/1024 234.86 252.18 17.32 7.37% std::find_end(vector<int>)_(single_element)/8192 1825.74 1981.90 156.16 8.55% std::find_end(vector<int>,_pred)_(match_near_end)/1000 574.17 52.98 -521.19 -90.77% std::find_end(vector<int>,_pred)_(match_near_end)/1024 3525.35 54.03 -3471.32 -98.47% std::find_end(vector<int>,_pred)_(match_near_end)/8192 190155.81 423.41 -189732.40 -99.78% std::find_end(vector<int>,_pred)_(near_matches)/1000 28541.98 24598.37 -3943.61 -13.82% std::find_end(vector<int>,_pred)_(near_matches)/1024 29696.55 25675.27 -4021.28 -13.54% std::find_end(vector<int>,_pred)_(near_matches)/8192 1846970.41 1596191.84 -250778.57 -13.58% std::find_end(vector<int>,_pred)_(process_all)/1000 519.71 592.14 72.43 13.94% std::find_end(vector<int>,_pred)_(process_all)/1024 529.74 491.07 -38.67 -7.30% std::find_end(vector<int>,_pred)_(process_all)/1048576 631923.41 643729.57 11806.16 1.87% std::find_end(vector<int>,_pred)_(process_all)/8192 4215.05 3909.30 -305.75 -7.25% std::find_end(vector<int>,_pred)_(same_length)/1000 1095.46 524.99 -570.47 -52.08% std::find_end(vector<int>,_pred)_(same_length)/1024 1117.95 537.65 -580.31 -51.91% std::find_end(vector<int>,_pred)_(same_length)/8192 8923.95 4307.13 -4616.83 -51.74% std::find_end(vector<int>,_pred)_(single_element)/1000 516.52 656.32 139.80 27.07% std::find_end(vector<int>,_pred)_(single_element)/1024 528.82 673.72 144.90 27.40% std::find_end(vector<int>,_pred)_(single_element)/8192 4210.37 5529.52 1319.15 31.33% Geomean 6995.43 3440.97 -3554.46 -50.81% ```
…iginally legal f64 values that we can store directly. (llvm#171602) Based off feedback from llvm#171478
…m#171637) They were using the wrong scheduler resource. They're also missing from the optimisation guides, but WriteLD should be closer at least.
…lvm#169914) This is technically ABI breaking, since `is_trivial` and `is_trivially_default_constructible` now return different results. However, I don't think that's a significant issue, since `allocator` is almost always used in classes which own memory, making them non-trivial anyways.
…m#169413) We've seen in quite a few cases while optimizing `__tree`'s copy construction that `_DetachedTreeCache` is actually quite slow and not necessarily an optimization at all. This patch removes the code, since it's now only used by `operator=(initializer_list)`, which should be quite cold code. We might look into actually optimizing it again in the future, but I doubt an optimization will be small enough compared to the likely speedup in real-world code this would give.
…lvm#165160) This removes a bit of code duplication and might simplify future segmented iterator optimitations.
Adding Annotation Inference in Lifetime Analysis.
This PR implicitly adds lifetime bound annotations to the AST which is
then used by functions which are parsed later to detect UARs etc.
Example:
```cpp
std::string_view f1(std::string_view a) {
return a;
}
std::string_view f2(std::string_view a) {
return f1(a);
}
std::string_view ff(std::string_view a) {
std::string stack = "something on stack";
return f2(stack); // warning: address of stack memory is returned
}
```
Note:
1. We only add lifetime bound annotations to the functions being
analyzed currently.
2. Currently, both annotation suggestion and inference work
simultaneously. This can be modified based on requirements.
3. The current approach works given that functions are already present
in the correct order (callee-before-caller). For not so ideal cases, we
can create a CallGraph prior to calling the analysis. This can be done
in the next PR.
Depends upon llvm#170900 Re-land llvm#169544 Previously we were less specific for POINTER/TARGET: encoding that they could alias with (almost) anything. In the new system, the "target data" tree is now a sibling of the other trees (e.g. "global data"). POITNTER variables go at the root of the "target data" tree, whereas TARGET variables get their own nodes under that tree. For example, ``` integer, pointer :: ip real, pointer :: rp integer, target :: it integer, target :: it2(:) real, target :: rt integer :: i real :: r ``` - `ip` and `rp` may alias with any variable except `i` and `r`. - `it`, `it2`, and `rt` may alias only with `ip` or `rp`. - `i` and `r` cannot alias with any other variable. Fortran 2023 15.5.2.14 gives restrictions on entities associated with dummy arguments. These do not allow non-target globals to be modified through dummy arguments and therefore I don't think we need to make all globals alias with dummy arguments. I haven't implemented it in this patch, but I wonder whether it is ever possible for `ip` to alias with `rt`. While I was updating the tests I fixed up some tests that still assumed that local alloc tbaa wasn't the default. Cray pointers/pointees are (optionally) modelled as aliasing with all non-descriptor data. This is not enabled by default. I found no functional regressions in the gfortran test suite.
…m#170323) (llvm#171787) ``` Step 7 (test-check-all) failure: Test just built components: check-all completed (failure) ******************** TEST 'LLVM :: CodeGen/AMDGPU/insert_vector_dynelt.ll' FAILED ******************** Exit Code: 1 Command Output (stdout): -- # RUN: at line 2 /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn -mcpu=fiji < /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll | /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -enable-var-scope -check-prefixes=GCN /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll # executed command: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn -mcpu=fiji # executed command: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -enable-var-scope -check-prefixes=GCN /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll # RUN: at line 3 /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -O0 -mtriple=amdgcn -mcpu=fiji < /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll | /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck --check-prefixes=GCN-O0 /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll # executed command: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -O0 -mtriple=amdgcn -mcpu=fiji # .---command stderr------------ # | # | # After Instruction Selection # | # Machine code for function insert_dyn_i32_6: IsSSA, TracksLiveness # | Function Live Ins: $sgpr16 in %8, $sgpr17 in %9, $sgpr18 in %10, $sgpr19 in %11, $sgpr20 in %12, $sgpr21 in %13, $vgpr0 in %14, $vgpr1 in %15 # | # | bb.0 (%ir-block.0): # | successors: %bb.1(0x80000000); %bb.1(100.00%) # | liveins: $sgpr16, $sgpr17, $sgpr18, $sgpr19, $sgpr20, $sgpr21, $vgpr0, $vgpr1 # | %15:vgpr_32 = COPY $vgpr1 # | %14:vgpr_32 = COPY $vgpr0 # | %13:sgpr_32 = COPY $sgpr21 # | %12:sgpr_32 = COPY $sgpr20 # | %11:sgpr_32 = COPY $sgpr19 # | %10:sgpr_32 = COPY $sgpr18 # | %9:sgpr_32 = COPY $sgpr17 # | %8:sgpr_32 = COPY $sgpr16 # | %17:sgpr_192 = REG_SEQUENCE %8:sgpr_32, %subreg.sub0, %9:sgpr_32, %subreg.sub1, %10:sgpr_32, %subreg.sub2, %11:sgpr_32, %subreg.sub3, %12:sgpr_32, %subreg.sub4, %13:sgpr_32, %subreg.sub5 # | %16:sgpr_192 = COPY %17:sgpr_192 # | %19:vreg_192 = COPY %17:sgpr_192 # | %28:sreg_64_xexec = IMPLICIT_DEF # | %27:sreg_64_xexec = S_MOV_B64 $exec # | # | bb.1: # | ; predecessors: %bb.1, %bb.0 # | successors: %bb.1(0x40000000), %bb.3(0x40000000); %bb.1(50.00%), %bb.3(50.00%) # | # | %26:vreg_192 = PHI %19:vreg_192, %bb.0, %18:vreg_192, %bb.1 # | %29:sreg_64 = PHI %28:sreg_64_xexec, %bb.0, %30:sreg_64, %bb.1 # | %31:sreg_32_xm0 = V_READFIRSTLANE_B32 %14:vgpr_32, implicit $exec # | %32:sreg_64 = V_CMP_EQ_U32_e64 %31:sreg_32_xm0, %14:vgpr_32, implicit $exec # | %30:sreg_64 = S_AND_SAVEEXEC_B64 killed %32:sreg_64, implicit-def $exec, implicit-def $scc, implicit $exec # | $m0 = COPY killed %31:sreg_32_xm0 # | %18:vreg_192 = V_INDIRECT_REG_WRITE_MOVREL_B32_V8 %26:vreg_192(tied-def 0), %15:vgpr_32, 3, implicit $m0, implicit $exec # | $exec = S_XOR_B64_term $exec, %30:sreg_64, implicit-def $scc # | S_CBRANCH_EXECNZ %bb.1, implicit $exec # | # | bb.3: ``` This reverts commit 15df9e7.
…m#171725) Currently fmul is not reassociated unless it has nsz, although this should be unnecessary.
…lvm#171158) Add additional bound for the induction variable of the scf.forall such that: %iv <= %lower_bound + (%trip_count - 1) * step Same as llvm#126426 but for scf.forall loop
The patch updates the lowering of `id` based pmevent also to intrinsics. The mask is simply (1 << event-id). Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This function contains most of the logic for BTI: - it takes the BasicBlock and the instruction used to jump to it. - Then it checks if the first non-pseudo instruction is a sufficient landing pad for the used call. - if not, it generates the correct BTI instruction. Also introduce the isCallCoveredByBTI helper to simplify the logic.
nsz can only change the behavior of the sign bit. The sign bit for fmul can be implemented as xor, which is associative. DAGCombiner already reassociates the multiply by 2 constants without nsz. Fixes llvm#64967
This patch adds TLS support for SystemZ on top of orc-runtime support. A separate orc-runtime support llvm#171062 has been created from earlier TLS support #[170706](llvm#170706). See conversations in [llvm#170706](llvm#170706) --------- Co-authored-by: anoopkg6 <anoopkg6@github.com>
llvm#171797) This patch fixes toolchain-msvc.test on Windows ARM64 hosts running under native ARM64 environment via vcvarsarm64.bat. Our lab buildbot recently switched from using cross vcvarsamd64_arm64.bat environment to native vcvarsarm64.bat. This patch updates FileCheck patterns to also allow HostARM64 and arm64 PATH entries. Changes: -> Extend host regex to match HostARM64 (case-insensitive) -> Allow arm64 in PATH tail. -> Apply same fix in both 32-bit and 64-bit sections.
Collaborator
Author
ronlieb
approved these changes
Dec 11, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.