Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 357c1970 (Sep 30) (3) #439

Merged
merged 474 commits into from
Jan 13, 2025

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

MaskRay and others added 30 commits September 29, 2024 15:54
…Type` (llvm#109435)

This PR fixes a bug in `SparseTensorDimOpRewriter` when `tensor.dim` has
an unranked tensor type. To prevent crashes, we now use
`tryGetSparseTensorType` instead of `getSparseTensorType`. Fixes
llvm#107807.
Align i128s to 16 bytes, following the example at
https://reviews.llvm.org/D86310.

clang already does this implicitly, but do it in backend code too for
the benefit of other frontends (see e.g
llvm#102783 &
rust-lang/rust#128950).
…FC) (llvm#110432)

I'm trying to speed up the reaching def analysis by changing the
underlying data structure.  Turning MBBReachingDefsInfo into a proper
class decouples the data structure and its users.  This patch does not
change the existing three-dimensional vector structure.

---------

Co-authored-by: Nikita Popov <[email protected]>
llvm#110272)

Currently callers of analyze can't get detailed information about a
missing header, e.g. resolve path. Only way to get at this is to use low
level walkUsed funciton, which is way more complicated than just calling
analyze.

This enables further analysis, e.g. when includes are spelled relative
to inner directories, caller can still know their path relative to
repository root.
This matches the type name defined in this header.
… `__hlsl_resource_t` builtin type (llvm#110079)

Replace `element_type*` handles in HLSLExternalSemaSource with
`__hlsl_resource_t` builtin type.

The handle used to be defined as `element_type*` which was used by the
provisional subscript operator implementation. Now that the handle is
`__hlsl_resource_t` the subscript placeholder implementation was updated
to add `element_type* e;` field to the resource struct. and return a
reference to that. This field is just a temporary workaround until the
indexing is implemented properly in llvm#95956, at which
point the field will be removed. This seemed like a better solution than
disabling many of the existing tests that already use the `[]` operator.
One test has to be disabled nevertheless because an error based on
interactions of const and template instantiation (potential bug that can
be investigated once indexing is implemented the right way).

Fixes llvm#84824
Windows have different separators for paths than Unix based OS. One of
the tests in debug-compilation-unit.ll didn't have Win supported '\\'
variant which broken test suite on that OS.
…m#107190)

The DominatorTree version is marked for deprecation, so we use the
DomTreeUpdater version. We also update sinkRegion() to iterate over
basic blocks instead of DomTreeNodes. The loop body calls
SplitBlockPredecessors. The DTU version calls
DomTreeUpdater::apply_updates(), which may call DominatorTree::reset().
This invalidates the worklist of DomTreeNodes to iterate over.
…lvm#110083)

According to ChuanqiXu9/clangd-for-modules#9,
I surprisingly found the support for C++20 modules doesn't support code
completion well.

After debugging, I found there are problems:
(1) We forgot to call `adjustHeaderSearchOptions` in code complete. This
may be an easy oversight.
(2) In `CodeCompleteOptions::getClangCompleteOpts`, we may set
`LoadExternal` as false when index is available. But we have support
modules with index. So it is conflicting. Given modules are opt in now,
I think it makes sense to to set LoadExternal as true when modules are
enabled.

This is a small fix and I wish it can land faster.
Fix InferAddressSpaces asserting on a load of a vector of flat
pointers.

Fixes llvm#110433
llvm#110247)

…aint that depends on a template parameter from an enclosing template as
members of the enclosing class.

Such function templates should be considered member-like constrained
friends per [temp.friend]p9 and
itanium-cxx-abi/cxx-abi#24 (comment)).
…lvm#106342)

This patch separates the computation of the final reduction result and
the intermediate stores of reduction.

---------

Co-authored-by: Florian Hahn <[email protected]>
Pass EarliestEscapeInfo to BatchAA in MemCpyOpt. This allows memcpy
elimination in cases where one of the involved pointers is captured
after the relevant memcpy/call.
llvm#109680)

…me lowering

SME instructions can only be used in streaming mode. PTRUE for
predicated counter and the ld/st pair can be used when:
  sve2.1  is available or
  sme2 available in function in streaming mode.
Previously the frame lowering only checking if sme2 available when
building the machine instruction.
This fix checks if sme2 is available and is subtarget in streaming mode
SimplifyCFG store speculation currently has some homegrown code to check
for a writable object, handling the alloca special case only.

Switch it to use the generic isWritableObject() API, which means that we
also support byval arguments, allocator return values, and writable
arguments.

I've adjusted isWritableObject() to also check for the noalias attribute
when handling writable. Otherwise, I don't think that we can generalize
from at-entry writability. This was not relevant for previous uses of
the function, because they'd already require noalias for other reasons
anyway.
The patch llvm#109680 is failing
because of the test sme-callee-save-restore-pairs.ll.
This patch fixes the output of the test
Large scratch offset with one on highest bit selected as negative,
negative offset has same binary representation in 16 bits as large
unsigned offset.
…lvm#110256)

Use i32 for offset instead of i16, this way it does not get interpreted
as negative 16 bit offset.
We do not expect to see live carry out outputs on these adds,
so add a dead flag. Split the test for the degenerate case. This
makes it more apparent a regression in a future commit does not matter.
…cer shared by terminator ops (llvm#110105)

-- This commit extends consumer fusion to take place even if the
producer has multiple uses.
-- The multiple uses of the producer essentially means that besides the
consumer op in concern, the only other uses of the producer are
allowed in :-
   1. scf.yield
   2. tensor.parallel_insert_slice

Signed-off-by: Abhishek Varma <[email protected]>
llvm#110242)

Change the names of the TableGen features to match the names used by
AMDGPUSubtarget. "Addressable" refers to the amount that can be accessed
by a single workgroup. Add some explanatory comments. NFC.
…109424)

Our support for derived types uses `getTypeSizeAndAlignment` to
calculate the offset of the members. The `fir.box` was not supported in
that function. It meant that any member which required descriptor was
not supported in the derived type.
    
We convert the type into an llvm type and then use the DataLayout to
calculate the size/offset of a member. There is no dependency on
`getTypeSizeAndAlignment` to get the size of the types.

There are 2 other changes in this PR:

1. The `recID` field is used to handle cases where we have a member
references its parent type.

2. A type cache is maintained to avoid duplication. It is also needed
for circular reference case.


Fixes llvm#108001.
Follow the same patterns as the other min/max variants.
arsenm and others added 24 commits October 1, 2024 18:54
…m#110108)

Most of PAuth-related code counts the instructions being inserted and
asserts that no more bytes are emitted than the size returned by the
getInstSizeInBytes(MI) method. This check seems useful not only for
PAuth-related instructions. Also, reimplementing it globally in
AArch64AsmPrinter makes it more robust and simplifies further
refactoring of PAuth-related code.
When combining two geps into one by adding the offsets, we have
to take some care when intersecting the flags, because nusw flags
cannot be straightforwardly preserved.

Add a helper for this on GEPNoWrapFlags so we won't have to repeat
this logic in various places.
…lvm#110672)

As a proxy criterion, mesa targets have unaligned-access-mode (which
determines whether the hardware allows unaligned memory accesses) not
set whereas amdhsa targets do. This PR changes tests to use amdhsa
instead of mesa and inserts additional checks with unaligned-access-mode
unset explicitly.

This is in preparation for PR llvm#110219, which will generate different
code depending on the unaligned-access-mode.
…erlapping Def/Use (llvm#109875)

The current RP handling for uses of an MI that overlap with defs is
confusing and unnecessary. Moreover, the lane masks do not accurately
model the liveness behavior of the subregs. This cleans things up a bit
and more accurately models subreg lane liveness by sinking the use
handling into subsent Uses loop.

The effect of this PR is to replace

A. `increaseRegPressure(Reg, LiveAfter, ~LiveAfter & LiveBefore)`

with 

B. `increaseRegPressure(Reg, LiveAfter, LiveBefore)`

Note that A (Defs loop) and B (Uses loop) have different definitions of
LiveBefore

A. `LiveBefore = (LiveAfter & ~DefLanes) | UseLanes`

and 

B. `LiveBefore =  LiveAfter | UseLanes`

Also note, `increaseRegPressure` will exit if `PrevMask` (`LiveAfter`
for both A/B) has any active lanes, thus these calls will only have an
effect if `LiveAfter` is 0.


A. NewMask = ~LiveAfter & ((LiveAfter & ~DefLanes) | UseLanes) => (1 &
UseLanes) => UseLanes = (0 | UseLanes) => (LiveAfter | UseLanes) =
NewMask B.
…lauses (llvm#109809)

This patch updates printing and parsing of operations including clauses
that define entry block arguments to the operation's region. This
impacts `in_reduction`, `map`, `private`, `reduction` and
`task_reduction`.

The proposed representation to be used by all such clauses is the
following:
```
<clause_name>([byref] [@<sym>] %value -> %block_arg [, ...] : <type>[, ...]) {
  ...
}
```

The `byref` tag is only allowed for reduction-like clauses and the
`@<sym>` is required and only allowed for the `private` and
reduction-like clauses. The `map` clause does not accept any of these
two.

This change fixes some currently broken op representations, like
`omp.teams` or `omp.sections` reduction:
```
omp.teams reduction([byref] @<sym> -> %value : <type>) {
^bb0(%block_arg : <type>):
  ...
}
```

Additionally, it addresses some redundancy in the representation of the
previously mentioned cases, as well as e.g. `map` in `omp.target`. The
problem is that the block argument name after the arrow is not checked
in any way, which makes some misleading representations legal:
```mlir
omp.target map_entries(%x -> %arg1, %y -> %arg0, %z -> %doesnt_exist : !llvm.ptr, !llvm.ptr, !llvm.ptr) {
^bb0(%arg0 : !llvm.ptr, %arg1 : !llvm.ptr, %arg2 : !llvm.ptr):
  ...
}
```

In that case, `%x` maps to `%arg0`, contrary to what the representation
states, and `%z` maps to `%arg2`. `%doesnt_exist` is not resolved, so it
would likely cause issues if used anywhere inside of the operation's
region.

The solution implemented in this patch makes it so that values
introduced after the arrow on the representation of these clauses
implicitly define the corresponding entry block arguments, removing the
potential for these problematic representations. This is what is already
implemented for the `private` and `reduction` clauses of `omp.parallel`.

There are a couple of consequences of this change:
- Entry block argument-defining clauses must come at the end of the
operation's representation and in alphabetical order. This is because
they are printed/parsed as part of the region and a standardized
ordering is needed to reliably match op arguments with their
corresponding entry block arguments via the `BlockArgOpenMPOpInterface`.
- We can no longer define per-clause assembly formats to be reused by
all operations that take these clauses, since they must be passed to a
custom printer including the region and arguments of all other entry
block argument-defining clauses. Code duplication and potential for
introducing issues is minimized by providing the generic
`{print,parse}BlockArgRegion` helpers and associated structures.

MLIR and Flang lowering unit tests are updated due to changes in the
order and formatting of impacted operations.
Since no passes compute DependenceAnalysis via the PassManager, there is
no value in preserving it here. Hence, strip the unnecessary dependency
on DependenceAnalysis.
…llvm#110562)

For setScore, the root function is setScoreByInterval with RegInterval
input
For determineWait, the root function is determineWait with RegInterval
input
…m#109810)

This patch updates the `omp.target_data` operation to use the same
formatting as `map` clauses on `omp.target` for `use_device_addr` and
`use_device_ptr`. This is done so the mapping that is being enforced
between op arguments and associated entry block arguments is explicit.

The way it is achieved is by marking these clauses as entry block
argument-defining and adjusting printer/parsers accordingly.

As a result of this change, block arguments for `use_device_addr` come
before those for `use_device_ptr`, which is the opposite of the previous
undocumented situation. Some unit tests are updated based on this
change, in addition to those updated because of the format change.
`Type::getPointerTo()` is to be deprecated & removed soon.
…lvm#109811)

This patch adds general information on the proposed approach to unify
the handling and representation of clauses that define entry block
arguments attached to operations that accept them.
None of these tested the case where the non-frame index operand
was a register.
)

The `omp.section` operation is an outlier in that the block arguments it
has are defined by clauses on the required parent `omp.sections`
operation.

This patch updates the definition of this operation introducing the
`BlockArgOpenMPOpInterface` to simplify the handling and verification of
these block arguments, implemented based on the parent `omp.sections`.
…vm#110573)

Decrease code size of `Intrinsic::getAttributes` function by uniquing
the function and argument attributes separately and using the
`IntrinsicsToAttributesMap` to store argument attribute ID in low 8 bits
and function attribute ID in upper 8 bits.

This reduces the number of cases to handle in the generated switch from
368 to 131, which is ~2.8x reduction in the number of switch cases.

Also eliminate the fixed size array `AS` and `NumAttrs` variable, and
instead call `AttributeList::get` directly from each case, with an
inline array of the <index, AttribueSet> pairs.
…m#109719)

Add support for taking the intersection of two AttributeLists s.t the
result list contains attributes that are valid in the context of both
inputs.

i.e if we have `nonnull align(32) noundef` intersected with `nonnull
align(16) dereferenceable(10)`, the result is `nonnull align(16)`.

Further it handles attributes that are not-droppable. For example
dropping `byval` can change the nature of a callsite/function so its
impossible to correct a correct intersection if its dropped from the
result. i.e `nonnull byval(i64)` intersected with `nonnull` is
invalid.

The motivation for the infrastructure is to enable sinking/hoisting
callsites with differing attributes.
[AutoBump] Merge with fixes of 005f815 (Sep 30) (4)
Base automatically changed from bump_to_c6876b4e to feature/fused-ops January 10, 2025 15:23
@mgehre-amd mgehre-amd requested a review from jorickert January 13, 2025 08:20
[AutoBump] Merge with afc0557 (Oct 01) (5)
@mgehre-amd mgehre-amd merged commit 038de4e into feature/fused-ops Jan 13, 2025
43 of 44 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_357c1970 branch January 13, 2025 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.