forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 357c1970 (Sep 30) (3) #439
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…Type` (llvm#109435) This PR fixes a bug in `SparseTensorDimOpRewriter` when `tensor.dim` has an unranked tensor type. To prevent crashes, we now use `tryGetSparseTensorType` instead of `getSparseTensorType`. Fixes llvm#107807.
Align i128s to 16 bytes, following the example at https://reviews.llvm.org/D86310. clang already does this implicitly, but do it in backend code too for the benefit of other frontends (see e.g llvm#102783 & rust-lang/rust#128950).
…FC) (llvm#110432) I'm trying to speed up the reaching def analysis by changing the underlying data structure. Turning MBBReachingDefsInfo into a proper class decouples the data structure and its users. This patch does not change the existing three-dimensional vector structure. --------- Co-authored-by: Nikita Popov <[email protected]>
llvm#110272) Currently callers of analyze can't get detailed information about a missing header, e.g. resolve path. Only way to get at this is to use low level walkUsed funciton, which is way more complicated than just calling analyze. This enables further analysis, e.g. when includes are spelled relative to inner directories, caller can still know their path relative to repository root.
This matches the type name defined in this header.
… `__hlsl_resource_t` builtin type (llvm#110079) Replace `element_type*` handles in HLSLExternalSemaSource with `__hlsl_resource_t` builtin type. The handle used to be defined as `element_type*` which was used by the provisional subscript operator implementation. Now that the handle is `__hlsl_resource_t` the subscript placeholder implementation was updated to add `element_type* e;` field to the resource struct. and return a reference to that. This field is just a temporary workaround until the indexing is implemented properly in llvm#95956, at which point the field will be removed. This seemed like a better solution than disabling many of the existing tests that already use the `[]` operator. One test has to be disabled nevertheless because an error based on interactions of const and template instantiation (potential bug that can be investigated once indexing is implemented the right way). Fixes llvm#84824
Windows have different separators for paths than Unix based OS. One of the tests in debug-compilation-unit.ll didn't have Win supported '\\' variant which broken test suite on that OS.
…m#107190) The DominatorTree version is marked for deprecation, so we use the DomTreeUpdater version. We also update sinkRegion() to iterate over basic blocks instead of DomTreeNodes. The loop body calls SplitBlockPredecessors. The DTU version calls DomTreeUpdater::apply_updates(), which may call DominatorTree::reset(). This invalidates the worklist of DomTreeNodes to iterate over.
…lvm#110083) According to ChuanqiXu9/clangd-for-modules#9, I surprisingly found the support for C++20 modules doesn't support code completion well. After debugging, I found there are problems: (1) We forgot to call `adjustHeaderSearchOptions` in code complete. This may be an easy oversight. (2) In `CodeCompleteOptions::getClangCompleteOpts`, we may set `LoadExternal` as false when index is available. But we have support modules with index. So it is conflicting. Given modules are opt in now, I think it makes sense to to set LoadExternal as true when modules are enabled. This is a small fix and I wish it can land faster.
Fix InferAddressSpaces asserting on a load of a vector of flat pointers. Fixes llvm#110433
llvm#110247) …aint that depends on a template parameter from an enclosing template as members of the enclosing class. Such function templates should be considered member-like constrained friends per [temp.friend]p9 and itanium-cxx-abi/cxx-abi#24 (comment)).
…lvm#106342) This patch separates the computation of the final reduction result and the intermediate stores of reduction. --------- Co-authored-by: Florian Hahn <[email protected]>
Pass EarliestEscapeInfo to BatchAA in MemCpyOpt. This allows memcpy elimination in cases where one of the involved pointers is captured after the relevant memcpy/call.
llvm#109680) …me lowering SME instructions can only be used in streaming mode. PTRUE for predicated counter and the ld/st pair can be used when: sve2.1 is available or sme2 available in function in streaming mode. Previously the frame lowering only checking if sme2 available when building the machine instruction. This fix checks if sme2 is available and is subtarget in streaming mode
SimplifyCFG store speculation currently has some homegrown code to check for a writable object, handling the alloca special case only. Switch it to use the generic isWritableObject() API, which means that we also support byval arguments, allocator return values, and writable arguments. I've adjusted isWritableObject() to also check for the noalias attribute when handling writable. Otherwise, I don't think that we can generalize from at-entry writability. This was not relevant for previous uses of the function, because they'd already require noalias for other reasons anyway.
The patch llvm#109680 is failing because of the test sme-callee-save-restore-pairs.ll. This patch fixes the output of the test
Large scratch offset with one on highest bit selected as negative, negative offset has same binary representation in 16 bits as large unsigned offset.
llvm#110467) …edefinition failures.
…lvm#110256) Use i32 for offset instead of i16, this way it does not get interpreted as negative 16 bit offset.
We do not expect to see live carry out outputs on these adds, so add a dead flag. Split the test for the degenerate case. This makes it more apparent a regression in a future commit does not matter.
…cer shared by terminator ops (llvm#110105) -- This commit extends consumer fusion to take place even if the producer has multiple uses. -- The multiple uses of the producer essentially means that besides the consumer op in concern, the only other uses of the producer are allowed in :- 1. scf.yield 2. tensor.parallel_insert_slice Signed-off-by: Abhishek Varma <[email protected]>
llvm#110242) Change the names of the TableGen features to match the names used by AMDGPUSubtarget. "Addressable" refers to the amount that can be accessed by a single workgroup. Add some explanatory comments. NFC.
…109424) Our support for derived types uses `getTypeSizeAndAlignment` to calculate the offset of the members. The `fir.box` was not supported in that function. It meant that any member which required descriptor was not supported in the derived type. We convert the type into an llvm type and then use the DataLayout to calculate the size/offset of a member. There is no dependency on `getTypeSizeAndAlignment` to get the size of the types. There are 2 other changes in this PR: 1. The `recID` field is used to handle cases where we have a member references its parent type. 2. A type cache is maintained to avoid duplication. It is also needed for circular reference case. Fixes llvm#108001.
Follow the same patterns as the other min/max variants.
…m#110108) Most of PAuth-related code counts the instructions being inserted and asserts that no more bytes are emitted than the size returned by the getInstSizeInBytes(MI) method. This check seems useful not only for PAuth-related instructions. Also, reimplementing it globally in AArch64AsmPrinter makes it more robust and simplifies further refactoring of PAuth-related code.
When combining two geps into one by adding the offsets, we have to take some care when intersecting the flags, because nusw flags cannot be straightforwardly preserved. Add a helper for this on GEPNoWrapFlags so we won't have to repeat this logic in various places.
…lvm#110672) As a proxy criterion, mesa targets have unaligned-access-mode (which determines whether the hardware allows unaligned memory accesses) not set whereas amdhsa targets do. This PR changes tests to use amdhsa instead of mesa and inserts additional checks with unaligned-access-mode unset explicitly. This is in preparation for PR llvm#110219, which will generate different code depending on the unaligned-access-mode.
…erlapping Def/Use (llvm#109875) The current RP handling for uses of an MI that overlap with defs is confusing and unnecessary. Moreover, the lane masks do not accurately model the liveness behavior of the subregs. This cleans things up a bit and more accurately models subreg lane liveness by sinking the use handling into subsent Uses loop. The effect of this PR is to replace A. `increaseRegPressure(Reg, LiveAfter, ~LiveAfter & LiveBefore)` with B. `increaseRegPressure(Reg, LiveAfter, LiveBefore)` Note that A (Defs loop) and B (Uses loop) have different definitions of LiveBefore A. `LiveBefore = (LiveAfter & ~DefLanes) | UseLanes` and B. `LiveBefore = LiveAfter | UseLanes` Also note, `increaseRegPressure` will exit if `PrevMask` (`LiveAfter` for both A/B) has any active lanes, thus these calls will only have an effect if `LiveAfter` is 0. A. NewMask = ~LiveAfter & ((LiveAfter & ~DefLanes) | UseLanes) => (1 & UseLanes) => UseLanes = (0 | UseLanes) => (LiveAfter | UseLanes) = NewMask B.
…lauses (llvm#109809) This patch updates printing and parsing of operations including clauses that define entry block arguments to the operation's region. This impacts `in_reduction`, `map`, `private`, `reduction` and `task_reduction`. The proposed representation to be used by all such clauses is the following: ``` <clause_name>([byref] [@<sym>] %value -> %block_arg [, ...] : <type>[, ...]) { ... } ``` The `byref` tag is only allowed for reduction-like clauses and the `@<sym>` is required and only allowed for the `private` and reduction-like clauses. The `map` clause does not accept any of these two. This change fixes some currently broken op representations, like `omp.teams` or `omp.sections` reduction: ``` omp.teams reduction([byref] @<sym> -> %value : <type>) { ^bb0(%block_arg : <type>): ... } ``` Additionally, it addresses some redundancy in the representation of the previously mentioned cases, as well as e.g. `map` in `omp.target`. The problem is that the block argument name after the arrow is not checked in any way, which makes some misleading representations legal: ```mlir omp.target map_entries(%x -> %arg1, %y -> %arg0, %z -> %doesnt_exist : !llvm.ptr, !llvm.ptr, !llvm.ptr) { ^bb0(%arg0 : !llvm.ptr, %arg1 : !llvm.ptr, %arg2 : !llvm.ptr): ... } ``` In that case, `%x` maps to `%arg0`, contrary to what the representation states, and `%z` maps to `%arg2`. `%doesnt_exist` is not resolved, so it would likely cause issues if used anywhere inside of the operation's region. The solution implemented in this patch makes it so that values introduced after the arrow on the representation of these clauses implicitly define the corresponding entry block arguments, removing the potential for these problematic representations. This is what is already implemented for the `private` and `reduction` clauses of `omp.parallel`. There are a couple of consequences of this change: - Entry block argument-defining clauses must come at the end of the operation's representation and in alphabetical order. This is because they are printed/parsed as part of the region and a standardized ordering is needed to reliably match op arguments with their corresponding entry block arguments via the `BlockArgOpenMPOpInterface`. - We can no longer define per-clause assembly formats to be reused by all operations that take these clauses, since they must be passed to a custom printer including the region and arguments of all other entry block argument-defining clauses. Code duplication and potential for introducing issues is minimized by providing the generic `{print,parse}BlockArgRegion` helpers and associated structures. MLIR and Flang lowering unit tests are updated due to changes in the order and formatting of impacted operations.
Since no passes compute DependenceAnalysis via the PassManager, there is no value in preserving it here. Hence, strip the unnecessary dependency on DependenceAnalysis.
…llvm#110562) For setScore, the root function is setScoreByInterval with RegInterval input For determineWait, the root function is determineWait with RegInterval input
…m#109810) This patch updates the `omp.target_data` operation to use the same formatting as `map` clauses on `omp.target` for `use_device_addr` and `use_device_ptr`. This is done so the mapping that is being enforced between op arguments and associated entry block arguments is explicit. The way it is achieved is by marking these clauses as entry block argument-defining and adjusting printer/parsers accordingly. As a result of this change, block arguments for `use_device_addr` come before those for `use_device_ptr`, which is the opposite of the previous undocumented situation. Some unit tests are updated based on this change, in addition to those updated because of the format change.
`Type::getPointerTo()` is to be deprecated & removed soon.
…lvm#109811) This patch adds general information on the proposed approach to unify the handling and representation of clauses that define entry block arguments attached to operations that accept them.
None of these tested the case where the non-frame index operand was a register.
) The `omp.section` operation is an outlier in that the block arguments it has are defined by clauses on the required parent `omp.sections` operation. This patch updates the definition of this operation introducing the `BlockArgOpenMPOpInterface` to simplify the handling and verification of these block arguments, implemented based on the parent `omp.sections`.
…vm#110573) Decrease code size of `Intrinsic::getAttributes` function by uniquing the function and argument attributes separately and using the `IntrinsicsToAttributesMap` to store argument attribute ID in low 8 bits and function attribute ID in upper 8 bits. This reduces the number of cases to handle in the generated switch from 368 to 131, which is ~2.8x reduction in the number of switch cases. Also eliminate the fixed size array `AS` and `NumAttrs` variable, and instead call `AttributeList::get` directly from each case, with an inline array of the <index, AttribueSet> pairs.
…m#109719) Add support for taking the intersection of two AttributeLists s.t the result list contains attributes that are valid in the context of both inputs. i.e if we have `nonnull align(32) noundef` intersected with `nonnull align(16) dereferenceable(10)`, the result is `nonnull align(16)`. Further it handles attributes that are not-droppable. For example dropping `byval` can change the nature of a callsite/function so its impossible to correct a correct intersection if its dropped from the result. i.e `nonnull byval(i64)` intersected with `nonnull` is invalid. The motivation for the infrastructure is to enable sinking/hoisting callsites with differing attributes.
[AutoBump] Merge with fixes of 005f815 (Sep 30) (4)
[AutoBump] Merge with afc0557 (Oct 01) (5)
jorickert
approved these changes
Jan 13, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.