[AutoBump] Merge with 357c1970 (Sep 30) (3) #439

mgehre-amd · 2025-01-10T12:27:26Z

No description provided.

…Type` (llvm#109435) This PR fixes a bug in `SparseTensorDimOpRewriter` when `tensor.dim` has an unranked tensor type. To prevent crashes, we now use `tryGetSparseTensorType` instead of `getSparseTensorType`. Fixes llvm#107807.

Align i128s to 16 bytes, following the example at https://reviews.llvm.org/D86310. clang already does this implicitly, but do it in backend code too for the benefit of other frontends (see e.g llvm#102783 & rust-lang/rust#128950).

…0` when `X != Y` (llvm#110413) Alive2: https://alive2.llvm.org/ce/z/9oDP6K I found this pattern in https://github.com/casadi/casadi/blob/04e75858d7e626dda62d83b862fc89fc26f52745/casadi/core/repmat.cpp#L70-L78.

…FC) (llvm#110432) I'm trying to speed up the reaching def analysis by changing the underlying data structure. Turning MBBReachingDefsInfo into a proper class decouples the data structure and its users. This patch does not change the existing three-dimensional vector structure. --------- Co-authored-by: Nikita Popov <[email protected]>

llvm#110272) Currently callers of analyze can't get detailed information about a missing header, e.g. resolve path. Only way to get at this is to use low level walkUsed funciton, which is way more complicated than just calling analyze. This enables further analysis, e.g. when includes are spelled relative to inner directories, caller can still know their path relative to repository root.

This matches the type name defined in this header.

… `__hlsl_resource_t` builtin type (llvm#110079) Replace `element_type*` handles in HLSLExternalSemaSource with `__hlsl_resource_t` builtin type. The handle used to be defined as `element_type*` which was used by the provisional subscript operator implementation. Now that the handle is `__hlsl_resource_t` the subscript placeholder implementation was updated to add `element_type* e;` field to the resource struct. and return a reference to that. This field is just a temporary workaround until the indexing is implemented properly in llvm#95956, at which point the field will be removed. This seemed like a better solution than disabling many of the existing tests that already use the `[]` operator. One test has to be disabled nevertheless because an error based on interactions of const and template instantiation (potential bug that can be investigated once indexing is implemented the right way). Fixes llvm#84824

Windows have different separators for paths than Unix based OS. One of the tests in debug-compilation-unit.ll didn't have Win supported '\\' variant which broken test suite on that OS.

…m#107190) The DominatorTree version is marked for deprecation, so we use the DomTreeUpdater version. We also update sinkRegion() to iterate over basic blocks instead of DomTreeNodes. The loop body calls SplitBlockPredecessors. The DTU version calls DomTreeUpdater::apply_updates(), which may call DominatorTree::reset(). This invalidates the worklist of DomTreeNodes to iterate over.

…lvm#110083) According to ChuanqiXu9/clangd-for-modules#9, I surprisingly found the support for C++20 modules doesn't support code completion well. After debugging, I found there are problems: (1) We forgot to call `adjustHeaderSearchOptions` in code complete. This may be an easy oversight. (2) In `CodeCompleteOptions::getClangCompleteOpts`, we may set `LoadExternal` as false when index is available. But we have support modules with index. So it is conflicting. Given modules are opt in now, I think it makes sense to to set LoadExternal as true when modules are enabled. This is a small fix and I wish it can land faster.

Fix InferAddressSpaces asserting on a load of a vector of flat pointers. Fixes llvm#110433

….PointerSub' (llvm#107596)

llvm#110247) …aint that depends on a template parameter from an enclosing template as members of the enclosing class. Such function templates should be considered member-like constrained friends per [temp.friend]p9 and itanium-cxx-abi/cxx-abi#24 (comment)).

…lvm#106342) This patch separates the computation of the final reduction result and the intermediate stores of reduction. --------- Co-authored-by: Florian Hahn <[email protected]>

Pass EarliestEscapeInfo to BatchAA in MemCpyOpt. This allows memcpy elimination in cases where one of the involved pointers is captured after the relevant memcpy/call.

llvm#109680) …me lowering SME instructions can only be used in streaming mode. PTRUE for predicated counter and the ld/st pair can be used when: sve2.1 is available or sme2 available in function in streaming mode. Previously the frame lowering only checking if sme2 available when building the machine instruction. This fix checks if sme2 is available and is subtarget in streaming mode

SimplifyCFG store speculation currently has some homegrown code to check for a writable object, handling the alloca special case only. Switch it to use the generic isWritableObject() API, which means that we also support byval arguments, allocator return values, and writable arguments. I've adjusted isWritableObject() to also check for the noalias attribute when handling writable. Otherwise, I don't think that we can generalize from at-entry writability. This was not relevant for previous uses of the function, because they'd already require noalias for other reasons anyway.

The patch llvm#109680 is failing because of the test sme-callee-save-restore-pairs.ll. This patch fixes the output of the test

Large scratch offset with one on highest bit selected as negative, negative offset has same binary representation in 16 bits as large unsigned offset.

llvm#110467) …edefinition failures.

…lvm#110256) Use i32 for offset instead of i16, this way it does not get interpreted as negative 16 bit offset.

We do not expect to see live carry out outputs on these adds, so add a dead flag. Split the test for the degenerate case. This makes it more apparent a regression in a future commit does not matter.

…cer shared by terminator ops (llvm#110105) -- This commit extends consumer fusion to take place even if the producer has multiple uses. -- The multiple uses of the producer essentially means that besides the consumer op in concern, the only other uses of the producer are allowed in :- 1. scf.yield 2. tensor.parallel_insert_slice Signed-off-by: Abhishek Varma <[email protected]>

llvm#110242) Change the names of the TableGen features to match the names used by AMDGPUSubtarget. "Addressable" refers to the amount that can be accessed by a single workgroup. Add some explanatory comments. NFC.

…109424) Our support for derived types uses `getTypeSizeAndAlignment` to calculate the offset of the members. The `fir.box` was not supported in that function. It meant that any member which required descriptor was not supported in the derived type. We convert the type into an llvm type and then use the DataLayout to calculate the size/offset of a member. There is no dependency on `getTypeSizeAndAlignment` to get the size of the types. There are 2 other changes in this PR: 1. The `recID` field is used to handle cases where we have a member references its parent type. 2. A type cache is maintained to avoid duplication. It is also needed for circular reference case. Fixes llvm#108001.

Follow the same patterns as the other min/max variants.

…m#110108) Most of PAuth-related code counts the instructions being inserted and asserts that no more bytes are emitted than the size returned by the getInstSizeInBytes(MI) method. This check seems useful not only for PAuth-related instructions. Also, reimplementing it globally in AArch64AsmPrinter makes it more robust and simplifies further refactoring of PAuth-related code.

When combining two geps into one by adding the offsets, we have to take some care when intersecting the flags, because nusw flags cannot be straightforwardly preserved. Add a helper for this on GEPNoWrapFlags so we won't have to repeat this logic in various places.

…lvm#110672) As a proxy criterion, mesa targets have unaligned-access-mode (which determines whether the hardware allows unaligned memory accesses) not set whereas amdhsa targets do. This PR changes tests to use amdhsa instead of mesa and inserts additional checks with unaligned-access-mode unset explicitly. This is in preparation for PR llvm#110219, which will generate different code depending on the unaligned-access-mode.

…erlapping Def/Use (llvm#109875) The current RP handling for uses of an MI that overlap with defs is confusing and unnecessary. Moreover, the lane masks do not accurately model the liveness behavior of the subregs. This cleans things up a bit and more accurately models subreg lane liveness by sinking the use handling into subsent Uses loop. The effect of this PR is to replace A. `increaseRegPressure(Reg, LiveAfter, ~LiveAfter & LiveBefore)` with B. `increaseRegPressure(Reg, LiveAfter, LiveBefore)` Note that A (Defs loop) and B (Uses loop) have different definitions of LiveBefore A. `LiveBefore = (LiveAfter & ~DefLanes) | UseLanes` and B. `LiveBefore = LiveAfter | UseLanes` Also note, `increaseRegPressure` will exit if `PrevMask` (`LiveAfter` for both A/B) has any active lanes, thus these calls will only have an effect if `LiveAfter` is 0. A. NewMask = ~LiveAfter & ((LiveAfter & ~DefLanes) | UseLanes) => (1 & UseLanes) => UseLanes = (0 | UseLanes) => (LiveAfter | UseLanes) = NewMask B.

…lauses (llvm#109809) This patch updates printing and parsing of operations including clauses that define entry block arguments to the operation's region. This impacts `in_reduction`, `map`, `private`, `reduction` and `task_reduction`. The proposed representation to be used by all such clauses is the following: ``` <clause_name>([byref] [@<sym>] %value -> %block_arg [, ...] : <type>[, ...]) { ... } ``` The `byref` tag is only allowed for reduction-like clauses and the `@<sym>` is required and only allowed for the `private` and reduction-like clauses. The `map` clause does not accept any of these two. This change fixes some currently broken op representations, like `omp.teams` or `omp.sections` reduction: ``` omp.teams reduction([byref] @<sym> -> %value : <type>) { ^bb0(%block_arg : <type>): ... } ``` Additionally, it addresses some redundancy in the representation of the previously mentioned cases, as well as e.g. `map` in `omp.target`. The problem is that the block argument name after the arrow is not checked in any way, which makes some misleading representations legal: ```mlir omp.target map_entries(%x -> %arg1, %y -> %arg0, %z -> %doesnt_exist : !llvm.ptr, !llvm.ptr, !llvm.ptr) { ^bb0(%arg0 : !llvm.ptr, %arg1 : !llvm.ptr, %arg2 : !llvm.ptr): ... } ``` In that case, `%x` maps to `%arg0`, contrary to what the representation states, and `%z` maps to `%arg2`. `%doesnt_exist` is not resolved, so it would likely cause issues if used anywhere inside of the operation's region. The solution implemented in this patch makes it so that values introduced after the arrow on the representation of these clauses implicitly define the corresponding entry block arguments, removing the potential for these problematic representations. This is what is already implemented for the `private` and `reduction` clauses of `omp.parallel`. There are a couple of consequences of this change: - Entry block argument-defining clauses must come at the end of the operation's representation and in alphabetical order. This is because they are printed/parsed as part of the region and a standardized ordering is needed to reliably match op arguments with their corresponding entry block arguments via the `BlockArgOpenMPOpInterface`. - We can no longer define per-clause assembly formats to be reused by all operations that take these clauses, since they must be passed to a custom printer including the region and arguments of all other entry block argument-defining clauses. Code duplication and potential for introducing issues is minimized by providing the generic `{print,parse}BlockArgRegion` helpers and associated structures. MLIR and Flang lowering unit tests are updated due to changes in the order and formatting of impacted operations.

Since no passes compute DependenceAnalysis via the PassManager, there is no value in preserving it here. Hence, strip the unnecessary dependency on DependenceAnalysis.

…llvm#110562) For setScore, the root function is setScoreByInterval with RegInterval input For determineWait, the root function is determineWait with RegInterval input

…m#109810) This patch updates the `omp.target_data` operation to use the same formatting as `map` clauses on `omp.target` for `use_device_addr` and `use_device_ptr`. This is done so the mapping that is being enforced between op arguments and associated entry block arguments is explicit. The way it is achieved is by marking these clauses as entry block argument-defining and adjusting printer/parsers accordingly. As a result of this change, block arguments for `use_device_addr` come before those for `use_device_ptr`, which is the opposite of the previous undocumented situation. Some unit tests are updated based on this change, in addition to those updated because of the format change.

`Type::getPointerTo()` is to be deprecated & removed soon.

…lvm#110067) llvm#60481

…lvm#109811) This patch adds general information on the proposed approach to unify the handling and representation of clauses that define entry block arguments attached to operations that accept them.

None of these tested the case where the non-frame index operand was a register.

) The `omp.section` operation is an outlier in that the block arguments it has are defined by clauses on the required parent `omp.sections` operation. This patch updates the definition of this operation introducing the `BlockArgOpenMPOpInterface` to simplify the handling and verification of these block arguments, implemented based on the parent `omp.sections`.

…lvm#109028)

…vm#110573) Decrease code size of `Intrinsic::getAttributes` function by uniquing the function and argument attributes separately and using the `IntrinsicsToAttributesMap` to store argument attribute ID in low 8 bits and function attribute ID in upper 8 bits. This reduces the number of cases to handle in the generated switch from 368 to 131, which is ~2.8x reduction in the number of switch cases. Also eliminate the fixed size array `AS` and `NumAttrs` variable, and instead call `AttributeList::get` directly from each case, with an inline array of the <index, AttribueSet> pairs.

…m#109719) Add support for taking the intersection of two AttributeLists s.t the result list contains attributes that are valid in the context of both inputs. i.e if we have `nonnull align(32) noundef` intersected with `nonnull align(16) dereferenceable(10)`, the result is `nonnull align(16)`. Further it handles attributes that are not-droppable. For example dropping `byval` can change the nature of a callsite/function so its impossible to correct a correct intersection if its dropped from the result. i.e `nonnull byval(i64)` intersected with `nonnull` is invalid. The motivation for the infrastructure is to enable sinking/hoisting callsites with differing attributes.

[AutoBump] Merge with fixes of 005f815 (Sep 30) (4)

[AutoBump] Merge with afc0557 (Oct 01) (5)

MaskRay and others added 30 commits September 29, 2024 15:54

[ELF] Pass Ctx & to Writer

cc6c059

[ELF] Pass Ctx & to InputFiles and SyntheticSections

079b832

[ELF] Pass Ctx & to Relocations

c490d34

[InstCombine] Fold `icmp eq/ne (X *nw Z), (Y *nw Z) -> icmp eq/ne Z, …

1efd122

…0` when `X != Y` (llvm#110413) Alive2: https://alive2.llvm.org/ce/z/9oDP6K I found this pattern in https://github.com/casadi/casadi/blob/04e75858d7e626dda62d83b862fc89fc26f52745/casadi/core/repmat.cpp#L70-L78.

[ORC-RT] Rename sections_tracker.h to record_section_tracker.h.

6292f11

This matches the type name defined in this header.

[SPIR-V] Fix of OpString separator in DI test (llvm#110249)

6f3c151

Windows have different separators for paths than Unix based OS. One of the tests in debug-compilation-unit.ll didn't have Win supported '\\' variant which broken test suite on that OS.

AMDGPU: Fix assertion on load of vector of pointers (llvm#110436)

a87640c

Fix InferAddressSpaces asserting on a load of a vector of flat pointers. Fixes llvm#110433

[clang][analyzer] Move 'alpha.core.PointerSub' checker into 'security…

0d384fe

….PointerSub' (llvm#107596)

[LV] Reuse VPReplicateRecipe to handle scalar stores in exit block. (l…

f8373cb

…lvm#106342) This patch separates the computation of the final reduction result and the intermediate stores of reduction. --------- Co-authored-by: Florian Hahn <[email protected]>

[MemCpyOpt] Use EarliestEscapeInfo (llvm#110280)

f5c02dd

Pass EarliestEscapeInfo to BatchAA in MemCpyOpt. This allows memcpy elimination in cases where one of the involved pointers is captured after the relevant memcpy/call.

[bazel] Fix build past 6292f11 (llvm#110459)

dd2792a

Fix test for PR#109680

f627c45

The patch llvm#109680 is failing because of the test sme-callee-save-restore-pairs.ll. This patch fixes the output of the test

AMDGPU: Add test for 16 bit unsigned scratch offsets (llvm#110255)

e9d12a6

Large scratch offset with one on highest bit selected as negative, negative offset has same binary representation in 16 bits as large unsigned offset.

[abi] [ItaniumMangle] Remove a test case that fails due to expected r… (

93eaa99

llvm#110467) …edefinition failures.

AMDGPU: Fix inst-selection of large scratch offsets with sgpr base (l…

83fe851

…lvm#110256) Use i32 for offset instead of i16, this way it does not get interpreted as negative 16 bit offset.

AMDGPU: Make a frame index test more realistic

8e0daab

We do not expect to see live carry out outputs on these adds, so add a dead flag. Split the test for the degenerate case. This makes it more apparent a regression in a future commit does not matter.

[AMDGPU] Rename LocalMemorySize features to AddressableLocalMemorySize (

6f956e3

llvm#110242) Change the names of the TableGen features to match the names used by AMDGPUSubtarget. "Addressable" refers to the amount that can be accessed by a single workgroup. Add some explanatory comments. NFC.

DAG: Handle vector legalization of minimumnum/maximumnum (llvm#109779)

5883ad3

Follow the same patterns as the other min/max variants.

arsenm and others added 24 commits October 1, 2024 18:54

AMDGPU: Fix executable permissions on file

dc98482

[clang][bytecode] Check GetPtrBase ops for null pointers (llvm#110673)

55c70f6

[clang][bytecode] Implement ia32_{pdep,pext} builtins (llvm#110675)

f3baa73

LoopSimplify: strip dependency on DA (NFC) (llvm#107379)

9f6f6af

Since no passes compute DependenceAnalysis via the PassManager, there is no value in preserving it here. Hence, strip the unnecessary dependency on DependenceAnalysis.

[AMDGPU] Refactor several functions for merging with downstream work. (…

c66dee4

…llvm#110562) For setScore, the root function is setScoreByInterval with RegInterval input For determineWait, the root function is determineWait with RegInterval input

[llvm][OMPIRBuilder] Avoid Type::getPointerTo() (NFC) (llvm#110678)

d071fda

`Type::getPointerTo()` is to be deprecated & removed soon.

[libc][stdio] Use proxy headers of stdio.h in src and test folders. (l…

c63112a

…lvm#110067) llvm#60481

[MLIR][OpenMP] Document entry block argument-defining clauses (NFC) (l…

4e52e6a

…lvm#109811) This patch adds general information on the proposed approach to unify the handling and representation of clauses that define entry block arguments attached to operations that accept them.

AMDGPU: Add missing tests for local stack alloc s_add_i32 handling

f61abee

None of these tested the case where the non-frame index operand was a register.

[libc++] Remove potential 0-sized array in __compressed_pair_padding (l…

0eb2602

…lvm#109028)

[SLP][NFC]Add a test with external cast and extracted operand, NFC

0dab022

[AutoBump] Merge with 357c197 (Sep 30)

ce9a9cc

[AutoBump] Merge with fixes of 005f815 (Sep 30)

b659815

Merge pull request #440 from Xilinx/bump_to_005f8153

f233f51

[AutoBump] Merge with fixes of 005f815 (Sep 30) (4)

[AutoBump] Merge with afc0557 (Oct 01)

a3b7199

Base automatically changed from bump_to_c6876b4e to feature/fused-ops January 10, 2025 15:23

mgehre-amd requested a review from jorickert January 13, 2025 08:20

Merge pull request #441 from Xilinx/bump_to_afc0557a

44962f7

[AutoBump] Merge with afc0557 (Oct 01) (5)

jorickert approved these changes Jan 13, 2025

View reviewed changes

mgehre-amd merged commit 038de4e into feature/fused-ops Jan 13, 2025
43 of 44 checks passed

mgehre-amd deleted the bump_to_357c1970 branch January 13, 2025 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 357c1970 (Sep 30) (3) #439

[AutoBump] Merge with 357c1970 (Sep 30) (3) #439

mgehre-amd commented Jan 10, 2025

[AutoBump] Merge with 357c1970 (Sep 30) (3) #439

[AutoBump] Merge with 357c1970 (Sep 30) (3) #439

Conversation

mgehre-amd commented Jan 10, 2025