Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 8a9921f5 (Oct 23) (17) #454

Open
wants to merge 69 commits into
base: bump_to_519eef3b
Choose a base branch
from

Conversation

jorickert
Copy link

No description provided.

jsji and others added 30 commits October 22, 2024 17:39
This is one of the many PRs to fix errors with LLVM_ENABLE_WERROR=on.
Built by GCC 11.

Refactor the code to avoid the false warning

llvm-project/llvm/tools/llvm-isel-fuzzer/llvm-isel-fuzzer.cpp
llvm-project/llvm/tools/llvm-isel-fuzzer/llvm-isel-fuzzer.cpp: In
function ‘int LLVMFuzzerInitialize(int*, char***)’:
llvm-project/llvm/tools/llvm-isel-fuzzer/llvm-isel-fuzzer.cpp:141:43:
error: ISO C++ forbids zero-size array ‘argv’ [-Werror=pedantic]
  141 |   ExitOnError ExitOnErr(std::string(*argv[0]) + ": error:");
      |
Clang uses timestamp files to track the last time an implicitly-built
PCM file was verified to be up-to-date with regard to its inputs. With
`-fbuild-session-{file,timestamp}=` and
`-fmodules-validate-once-per-build-session` this reduces the number of
times a PCM file is checked per "build session".

The behavior I'm seeing with the current scheme is that when lots of
Clang instances wait for the same PCM to be built, they race to validate
it as soon as the file lock gets released, causing lots of concurrent
IO.

This patch makes it so that the timestamp is written by the same Clang
instance responsible for building the PCM while still holding the lock.
This makes it so that whenever a PCM file gets compiled, it's never
re-validated in the same build session.

I believe this is as sound as the current scheme. One thing to be aware
of is that there might be a time interval between accessing input file N
and writing the timestamp file, where changes to input files 0..<N would
not result in a rebuild. Since this is the case current scheme too, I'm
not too concerned about that.

I've seen this speed up `clang-scan-deps` by ~27%.
llvm#112904 will add typechecking to submulticlass arguments, and these
ones are currently mistyped.
We already have the .o, there is no reason to go .o -> YAML -> .o
…13350)

This corrects a couple off by ones related to the sampling of
**instrumented** counters, and enables setting 100% rates for burst
sampling (burst duration = period).

Off by ones:
Prior to this change it was impossible to set a period of 65535 because
this was converted to fast sampling which rollsover at USHRT_MAX + 1
(65536). Similarly the burst durations would collect burst duration + 1
counts as they used an ULE comparison.

100% sampling:
Although this is not useful for a productionized use case, it does allow
for more deterministic testing with the sampling checks in place. After
all the off by ones are fixed, allowing for 100% sampling is a matter of
letting burst duration = period.
With sampled instrumentation (llvm#69535), profile counts may appear corrupt
and `fixFuncEntryCount` may assert. In particular a function can have a
0 block count for its entry, while later blocks are non zero. This is
only likely to happen for colder functions, so it is reasonable to take
any action that does not crash. Here we simply bail from fixing the
entry count.
…0569)

Extend the logic added in 123c036
(llvm#76612) to support pointers to
non-builtin types by using the mangled name of the canonical type.

PR: llvm#110569
…cessible outside of Sema (llvm#113206)

Moves `IsIntangibleType` from SemaHLSL to Type class and renames it to
`isHLSLIntangibleType`. The existing `isHLSLIntangibleType` is renamed
to `isHLSLBuiltinIntangibleType` and updated to return true only for the
builtin `__hlsl_resource_t` type.

This change makes `isHLSLIntangibleType` functionality accessible
outside of Sema, for example from clang CodeGen.
Add support for ``llvm.nvvm.fshl.clamp`` and ``llvm.nvvm.fshr.clamp``
intrinsics. These intrinsics are similar to the generic llvm funnel
shift, except that the shift value is clamped to the integer width.
Currently only ``i32`` is supported and is implemented with the
`shf.[rl].clamp.b32` PTX instruction.
…2802)

Store Swift mangled names in DW_AT_linkage_name. The Swift compiler
emits only the type mangled name in debug information, and LLDB uses
those mangled names as keys to look up size, alignment, fields, etc
from either reflection metadata or Swift modules.

Additionally, emit types linkage names for types into the accelerator
table if they exist and they're different from the display name.
…e with flexible array init (llvm#113336)

Fixes: llvm#113187
Avoid to create init function since clang does not support global
variable with flexible array init.
It will cause assertion failure later.
This patch adds functionality for atomically reading `llvm.struct`
types.

Fixes: llvm#93441
…ds (llvm#113264)

Looks like having a constant in `Z` also caused infinite loops. This
fixes llvm#113240.
…ructs (llvm#113045)

According to OpenMPv5.2 1.2.6, "For Fortran, a scalar variable with
intrinsic type, as defined by the base language, excluding character
type.". Likewise, section 4.3.1.3 states that atomic operations are on
"scalar variables of intrinsic type". This PR hence introduces a check
to error out when CHARACTER type is used in atomic operations.

Fixes llvm#112918
…lvm#113108)

Restricts the verifier for tensor.pack and tensor.unpack Ops so that the
following is no longer allowed:

```mlir
  %c8 = arith.constant 8 : index
  %0 = tensor.pack %input inner_dims_pos = [0, 1] inner_tiles = [8, %c8] into %output : tensor<?x?xf32> -> tensor<?x?x8x8xf32>
```

Specifically, in line with other Tensor Ops, require:
  * a dynamic dimensions for each (dynamic) SSA value,
  * a static dimension for each static size (attribute).

In the example above, a static dimension (8) is mixed with a dynamic
size (%c8).

Note that this is mostly deleting existing code - that's because this
change simplifies the logic in verifier.

For more context:
* https://discourse.llvm.org/t/tensor-ops-with-dynamic-sizes-which-behaviour-is-more-correct
This fixes the infer output shape of TOSA slice op for start/size values
that are out-of-bound or -1

added tests to check:
  - size = -1
  - size is out of bound
  - start is out of bound

Signed-off-by: Tai Ly <[email protected]>
Fix ordering of checks in atomic02.f90.
…cl (llvm#113276)

This is more similar to the diagnostic output of the current interpreter
The patch adds graceful handling of incorrectly constructed MLIR
operation with less operands than expected.
…h-abs feature

This is to align with GAS. Additionally, there are some minor changes:
the definition and expansion process of the TLS_DESC pseudo-instruction
were modified in the same style.

Reviewed By: heiher

Pull Request: llvm#112858
DavidSpickett and others added 30 commits October 23, 2024 09:06
Fixes llvm#113154

The encodings used for llvm.trap() on ARM were all marked as barriers
and terminators. This lead to stack frame destroy code being inserted
before the trap if the trap was the last thing in the function and it
had no return statement.
```
void fn() {
  volatile int i = 0;
  __builtin_trap();
}
```
Produced:
```
fn:
        push    {r11, lr}   << stack frame create
<...>
        mov     sp, r11
        pop     {r11, lr}   << stack frame destroy
        .inst   0xe7ffdefe  << trap
        bx      lr
```
All the other targets don't mark them this way, instead they mark them
with isTrap. I've changed ARM to do this, which fixes the code
generation:
```
fn:
        push    {r11, lr}   << stack frame create
<...>
        .inst   0xe7ffdefe  << trap
        mov     sp, r11
        pop     {r11, lr}   << stack frame destroy
        bx      lr
```
I've updated the existing trap test to force the need for a stack frame,
then check that the instruction immediately after the trap is resetting
the stack pointer.

debugtrap was already working but I've added the same checks for it
anyway.
…literal in StackAddressEscape

This patch simplifies the diagnostic message in the core.StackAddrEscape
for stack memory associated with compound literals by removing the
redundant "returned to caller" suffix.
Example: https://godbolt.org/z/KxM67vr7c

```c
// clang --analyze -Xanalyzer -analyzer-checker=core.StackAddressEscape
void* compound_literal() {
  return &(unsigned short){((unsigned short)0x22EF)};
}
```

warning: Address of stack memory associated with a compound literal
declared on line 2 **returned to caller returned to caller**
[core.StackAddressEscape]
This PR updates the cast to bool from IntN to treat any non-zero value
as TRUE. This makes the cast more resilient to non-generic (i.e. "non
1") TRUE values.

Signed-off-by: Dmitriy Smirnov <[email protected]>
…3305)

Extends `nowait` support for other device directives. This PR refactors
the task generation utils used for the `target` directive so that they
are general enough to be reused for other device directives as well.
… docs (llvm#112869)

* Note up front that the author may not have permissions to use the
merge button and should ask a reviewer to do those steps.
* Make it clear that a single commit PR can be landed with a single
button click.
* There are in fact 3 ways to land a multi-commit PR.
* Order the ways in increasing amount of overhead for the PR author.
* Put them in bullet point sections so they are visually separate.
* Add a note that force pushes can be problematic when the PR has multiple authors, but don't go too much into how to solve that, Git's docs are better here anyway.
Until now, these options have been hardcoded as downstream patches in
LLD. Add them to the driver so that the private patches can be removed.

PS5 only. The implementation of these behaviours will remain in the
proprietary linker on PS4.

SIE tracker: TOOLCHAIN-16704
hlfir.assign currently has the `MemoryEffects<[MemWrite]` which makes it
look like it can write to anything. This is good for some cases where
the assign effect cannot be precisely described through the MLIR side
effect API (e.g., when the LHS is a descriptor and it is not possible to
get an OpOperand describing the data address, or when derived type are
involved and finalization could be called, or user defined assignment
for some components). For the most common case of hlfir.assign on
intrinsic types without whole allocatable LHS, this is pessimistic.

This patch implements a finer description of the side effects when
possible, and also adds the proper read/allocate/free effects when
relevant.

The ultimate goal is to suppress the generation of temporary for the LHS
address when dealing with an assignment to a vector subscripted LHS
where the vector subscript is an array constructor that does not refer
to the LHS (as in `x([a,b]) = y`).

Two more patches will follow to enable this.
…oca (llvm#113321)

See https://reviews.llvm.org/D157626 for the rational of declare having
side effects.

The write effect is to scary for passes that look for read/write effects
without caring about the resource affected. I know Slava asked for it,
but I think the creation of the `DebuggingResource` was enough and that
a write is too much. The alloca effect is sufficient to prevent DCE to
remove it, which is all we care about currently.

This currently is flag as a reason for creating LHS temporary in
assignment to vector subscripted entity with array constructor.
There is a lot of read/write side effect analysis in the
"lower-hlfir-ordered-assignments" pass, and I feel like we will just
keep adding weird "debug ressource" bypassing here and there with these
side effects.
…nt (llvm#113330)

Last patch required to avoid creating a temporary for the LHS when
dealing with `x([a,b]) = y`.

The code dealing with "ordered assignments" (where, forall, user and
vector subscripted assignments) is saving the evaluated RHS/LHS and
masks if they have write effects because this write effects should not
be evaluated when they affect entities that may be written to in other
contexts after the evaluation and before the re-evaluation.

But when dealing with write to storage allocated in the region for the
expression being evluated, there is no problem to re-evaluate the write:
it has no effect outside of the expression evaluation that owns the
allocation.

In the case of `x([a,b]) = y`, the temporary is created for the vector
subscript. Raising the HLFIR abstraction for simple array constructors
may be a good idea, but local temps are created in other contexts, so
this fix is more generic.
…ons (llvm#113292)

This patch adds the zeroing predicate forms (Pg/z) of the following
instructions:
	- FCVTXNT
	- FCVTNT
	- FCVTLT
	- BFCVTNT

As specified in https://developer.arm.com/documentation/ddi0602. 

Co-authored-by: Spencer Abson
[[email protected]](mailto:[email protected])
…ls (llvm#113283)

On ARM64EC, external function calls emit a pair of weak-dependency
aliases: `func` to `#func` and `#func` to the `func` guess exit thunk
(instead of a single undefined `func` symbol, which would be emitted on
other targets). Allow such aliases to be overridden by lazy archive
symbols, just as we would for undefined symbols.
The Intel C++ Compiler (ICX) passes linker flags through the driver
unlike MSVC and clang-cl, and therefore needs them to be prefixed with
`/Qoption,link` (the equivalent of `-Wl,` for gcc on *nix).

Use `LINKER:` prefix wherever supported by cmake, when that's not
possible fall-back to `${CMAKE_CXX_LINKER_WRAPPER_FLAG}`. CMake replaces
these with `/Qoption,link` for ICX and with the empty string for MSVC
and clang-cl.

For `target_link_libraries` neither `LINKER:` (not supported prior to
CMake 3.32) nor `${CMAKE_CXX_LINKER_WRAPPER_FLAG}` (does not begin with
`-` would be taken as a library name) works, use `-Qoption,link`
directly within a conditional generator expression that we're linking
with ICX.

For MSVC and clang-cl no functional change is intended.

Tested by compiling with ICX and setting
`CMAKE_(EXE|SHARED|STATIC|MODULE)_LINKER_FLAGS_INIT` to
`-Werror=unknown-argument`.

RFC:
https://discourse.llvm.org/t/rfc-cmake-linker-flags-need-wl-equivalent-for-intel-c-icx-on-windows/82446
…lazy archive symbol to the symbol table on ARM64EC (llvm#113284)

On ARM64EC, a function symbol may appear in both mangled and demangled
forms:
- ARM64EC archives contain only the mangled name, while the demangled
symbol is defined by the object file as an alias.
- x86_64 archives contain only the demangled name (the mangled name is
usually defined by an object referencing the symbol as an alias to a
guess exit thunk).
- ARM64EC import files contain both the mangled and demangled names for
thunks.

If more than one archive defines the same function, this could lead to
different libraries being used for the same function depending on how
they are referenced. Avoid this by checking if the paired symbol is
already defined before adding a symbol to the table.
…m#112928)

Member pointers refer to data or function members of a `CXXRecordDecl` and
require a `MSInheritanceAttr` in order to be complete. Without that we
cannot calculate their size in memory. The attempt has been causing a crash
further down in the clang AST context. In order to implement the feature,
DWARF will need a new attribtue to convey the information. For the moment,
this patch teaches LLDB to handle to situation and avoid the crash.
…lvm#111130)

Before this patch, redundant COPY couldn't be removed for the following
case:
```
  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0
```
This patch adds support for tracking the users of the source register
during backward propagation, so that we can remove the redundant COPY in
the above case and optimize it to:
```
  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1
```
This PR merges large offsets into the base address loading.
llvm#113309)

llvm-cxxfilt can demangle names of data symbols, in addition to function
names.

    $ llvm-cxxfilt _ZN6garden5gnomeE
    garden::gnome

And type names too, on request:

    $ llvm-cxxfilt -t i
    int

Update some overly specific the wording in the --help and documentation
that suggests otherwise.
This patch adds these new vector sizes for neon:
   mfloat8x16_t and mfloat8x8_t

    According to the ARM ACLE PR#323[1].

    [1] ARM-software/acle#323
llvm#111531)

Bot maintainers should be aware and it became too much of a burden
for developers. In particular on Windows, where make.exe won't be
found in Path typically.
…2867)

The Intel C++ Compiler (ICX) passes linker flags through the driver
unlike MSVC and clang-cl, and therefore needs them to be prefixed with
`/Qoption,link` (the equivalent of -Wl, for gcc on *nix).

Use the `LINKER:` prefix for the `/EXPORT:` options in clang-repl, this
expands to the correct flag for ICX and nothing for MSVC / clang-cl.

RFC:
https://discourse.llvm.org/t/rfc-cmake-linker-flags-need-wl-equivalent-for-intel-c-icx-on-windows/82446
These two veclibs are only available for AArch64 targets, and as
mentioned in https://discourse.llvm.org/t/rfc-should-fveclib-imply-fno-math-errno-for-all-targets/81384,
we (Arm) think that `-fveclib` should imply `-fno-math-errno`. By
setting `-fveclib` the user shows they intend to use the vector math
functions, which implies they don't care about errno. However,
currently, the vector mappings won't be used in many cases without
setting `-fno-math-errno` separately.

Making this change would also help resolve some inconsistencies in how
vector mappings are applied (see llvm#108980 (comment)).

Note: Both SLEEF and ArmPL state that they do not set `errno`:

- https://developer.arm.com/documentation/101004/2410/General-information/Arm-Performance-Libraries-math-functions
  * "The vector functions in libamath which are available on Linux may not set errno nor raise exceptions"
- https://sleef.org/2-references/libm/
  *  "These functions do not set errno nor raise an exception."
…llvm#113167)

Define `OmpIteratorSpecifier` and `OmpIteratorModifier` parser classes,
and add parsing for them. Those are reusable between any clauses that
use iterator modifiers.

Add support for iterator modifiers to the MAP clause up to lowering,
where a TODO message is emitted.
…at container-inserter does (llvm#113103)

This patch implements LWG4016: container-insertable checks do not match
what container-inserter does.
When compiling for an SVE target we can use INDEX to generate constant
fixed-length step vectors, e.g.:
```
uint32x4_t foo() {
  return (uint32x4_t){0, 1, 2, 3};
}
```
Currently:
```
foo():
        adrp    x8, .LCPI1_0
        ldr     q0, [x8, :lo12:.LCPI1_0]
        ret
```
With INDEX:
```
foo():
        index   z0.s, #0, #1
        ret
```

The logic for this was already in `LowerBUILD_VECTOR`, though it was
hidden under a check for `!Subtarget->isNeonAvailable()`. This patch
refactors this to enable the corresponding code path unconditionally for
constant step vectors (as long as we can use SVE for them).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.