Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EarlyCSE] Fix improper icmp simplification #122043

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

llvmssh
Copy link
Contributor

@llvmssh llvmssh commented Jan 8, 2025

When icmp's argument is a volatile instruction, we do not assume that icmp can be simplified.

llvmssh and others added 4 commits December 2, 2024 17:42
    When the ModuleSymbolTable is generated, the
    binary consistency problem occurs due to the
    unorder of the data structure when collecting
    asm symbols.
Check: the icmp parameter is volatile.
@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: None (llvmssh)

Changes

When icmp's argument is a volatile instruction, we do not assume that icmp can be simplified.


Full diff: https://github.com/llvm/llvm-project/pull/122043.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/InstructionSimplify.cpp (+15)
  • (added) llvm/test/Transforms/EarlyCSE/icmp_with_volatile.ll (+40)
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 515806428cbb29..db5156fcab9c1a 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -3726,6 +3726,21 @@ static Value *simplifyICmpInst(CmpPredicate Pred, Value *LHS, Value *RHS,
                                const SimplifyQuery &Q, unsigned MaxRecurse) {
   assert(CmpInst::isIntPredicate(Pred) && "Not an integer compare!");
 
+  // volatile check
+  // %mark = load volatile ptr, ptr %1, align 8
+  // icmp eq ptr %mark, null
+  if (const auto LHSInstr = dyn_cast<Instruction>(LHS)) {
+     if (LHSInstr->isVolatile()) {
+       return nullptr;
+     }
+  }
+
+  if (const auto RHSInstr = dyn_cast<Instruction>(RHS)) { 
+     if (RHSInstr->isVolatile()) {
+        return nullptr;
+     }
+  }
+
   if (Constant *CLHS = dyn_cast<Constant>(LHS)) {
     if (Constant *CRHS = dyn_cast<Constant>(RHS))
       return ConstantFoldCompareInstOperands(Pred, CLHS, CRHS, Q.DL, Q.TLI);
diff --git a/llvm/test/Transforms/EarlyCSE/icmp_with_volatile.ll b/llvm/test/Transforms/EarlyCSE/icmp_with_volatile.ll
new file mode 100644
index 00000000000000..d85d56d50fb4a6
--- /dev/null
+++ b/llvm/test/Transforms/EarlyCSE/icmp_with_volatile.ll
@@ -0,0 +1,40 @@
+; RUN: opt < %s -S -passes=early-cse | FileCheck %s
+
+;When icmp's argument is a volatile instruction, we do not assume that icmp can
+;be simplified.
+define dso_local noundef i32 @_Z1funcv() {
+; CHECK-LABEL: @_Z1funcv(
+; CHECK-NEXT:    %1 = alloca ptr, align 8
+; CHECK-NEXT:    store volatile ptr null, ptr %1, align 8
+; CHECK-NEXT:    %2 = load volatile ptr, ptr %1, align 8
+; CHECK-NEXT:    %3 = ptrtoint ptr %2 to i64
+; CHECK-NEXT:    %4 = and i64 %3, 1
+; CHECK-NEXT:    %5 = icmp eq i64 %4, 0
+; CHECK-NEXT:    %6 = zext i1 %5 to i32
+; CHECK-NEXT:    ret i32 %6
+;
+  %1 = alloca ptr, align 8
+  store volatile ptr null, ptr %1, align 8
+  %2 = load volatile ptr, ptr %1, align 8
+  %3 = ptrtoint ptr %2 to i64
+  %4 = and i64 %3, 1
+  %5 = icmp eq i64 %4, 0
+  %6 = zext i1 %5 to i32
+  ret i32 %6
+}
+
+define dso_local noundef i32 @_Z2funcv() {
+; CHECK-LABEL: @_Z2funcv(
+; CHECK-NEXT:    %1 = alloca ptr, align 8
+; CHECK-NEXT:    store ptr null, ptr %1, align 8
+; CHECK-NEXT:    ret i32 1
+;
+  %1 = alloca ptr, align 8
+  store ptr null, ptr %1, align 8
+  %2 = load ptr, ptr %1, align 8
+  %3 = ptrtoint ptr %2 to i64
+  %4 = and i64 %3, 1
+  %5 = icmp eq i64 %4, 0
+  %6 = zext i1 %5 to i32
+  ret i32 %6
+}

Copy link

github-actions bot commented Jan 8, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff a0f5bbcfb71a28cd3eaa308250af63a0889a1c85 dc7c72cef6a0f98d7361199ff4eed4864fb87323 --extensions cpp -- llvm/lib/Analysis/InstructionSimplify.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index db5156fcab..629cb484e3 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -3730,15 +3730,15 @@ static Value *simplifyICmpInst(CmpPredicate Pred, Value *LHS, Value *RHS,
   // %mark = load volatile ptr, ptr %1, align 8
   // icmp eq ptr %mark, null
   if (const auto LHSInstr = dyn_cast<Instruction>(LHS)) {
-     if (LHSInstr->isVolatile()) {
-       return nullptr;
-     }
+    if (LHSInstr->isVolatile()) {
+      return nullptr;
+    }
   }
 
-  if (const auto RHSInstr = dyn_cast<Instruction>(RHS)) { 
-     if (RHSInstr->isVolatile()) {
-        return nullptr;
-     }
+  if (const auto RHSInstr = dyn_cast<Instruction>(RHS)) {
+    if (RHSInstr->isVolatile()) {
+      return nullptr;
+    }
   }
 
   if (Constant *CLHS = dyn_cast<Constant>(LHS)) {

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://godbolt.org/z/59v83q7v7 If I don't miss something, there are no test changes after this patch.
Can you provide an alive2 link to demonstrate that there is something wrong in simplifyICmpInst?

@llvmssh
Copy link
Contributor Author

llvmssh commented Jan 8, 2025

https://godbolt.org/z/59v83q7v7 If I don't miss something, there are no test changes after this patch. Can you provide an alive2 link to demonstrate that there is something wrong in simplifyICmpInst?

Thanks for the reply, I'll check the original question.

@llvmssh
Copy link
Contributor Author

llvmssh commented Jan 8, 2025

https://godbolt.org/z/59v83q7v7 If I don't miss something, there are no test changes after this patch. Can you provide an alive2 link to demonstrate that there is something wrong in simplifyICmpInst?

My case is not the same as the original, I will update it later.

@llvmssh
Copy link
Contributor Author

llvmssh commented Jan 8, 2025

https://godbolt.org/z/59v83q7v7 If I don't miss something, there are no test changes after this patch. Can you provide an alive2 link to demonstrate that there is something wrong in simplifyICmpInst?

My case is not the same as the original, I will update it later.

This link shows the scenario where I actually had problems, and the original application code looks like this, multi-threaded environment. : https://godbolt.org/z/1MfndEMMs

@dtcxzyw
Copy link
Member

dtcxzyw commented Jan 8, 2025

https://godbolt.org/z/1MfndEMMs

In this case, %4 is passed into _ZN1A5valueEv which has noundef and nonnull/dereferenceable parameter attributes. The optimizer will assume %4 is nonnull in the subsequent execution path.

Without noundef and nonnull/dereferenceable, the icmp will not be optimized out: https://godbolt.org/z/chhxE3Txr

@llvmssh
Copy link
Contributor Author

llvmssh commented Jan 8, 2025

https://godbolt.org/z/1MfndEMMs

In this case, %4 is passed into _ZN1A5valueEv which has noundef and nonnull/dereferenceable parameter attributes. The optimizer will assume %4 is nonnull in the subsequent execution path.

Without noundef and nonnull/dereferenceable, the icmp will not be optimized out: https://godbolt.org/z/chhxE3Txr

Yes, icmp is simplified with its information. Should we add the check of volatile to icmp simplify?

@dtcxzyw
Copy link
Member

dtcxzyw commented Jan 8, 2025

https://godbolt.org/z/1MfndEMMs

In this case, %4 is passed into _ZN1A5valueEv which has noundef and nonnull/dereferenceable parameter attributes. The optimizer will assume %4 is nonnull in the subsequent execution path.
Without noundef and nonnull/dereferenceable, the icmp will not be optimized out: https://godbolt.org/z/chhxE3Txr

Yes, icmp is simplified with its information. Should we add the check of volatile to icmp simplify?

I mean this transformation is correct. If the loaded pointer %4 is nonnull, it hits UB when calling _ZN1A5valueEv.

@llvmssh
Copy link
Contributor Author

llvmssh commented Jan 8, 2025

https://godbolt.org/z/1MfndEMMs

In this case, %4 is passed into _ZN1A5valueEv which has noundef and nonnull/dereferenceable parameter attributes. The optimizer will assume %4 is nonnull in the subsequent execution path.
Without noundef and nonnull/dereferenceable, the icmp will not be optimized out: https://godbolt.org/z/chhxE3Txr

Yes, icmp is simplified with its information. Should we add the check of volatile to icmp simplify?

I mean this transformation is correct. If the loaded pointer %4 is nonnull, it hits UB when calling _ZN1A5valueEv.

https://godbolt.org/z/1MfndEMMs

In this case, %4 is passed into _ZN1A5valueEv which has noundef and nonnull/dereferenceable parameter attributes. The optimizer will assume %4 is nonnull in the subsequent execution path.
Without noundef and nonnull/dereferenceable, the icmp will not be optimized out: https://godbolt.org/z/chhxE3Txr

Yes, icmp is simplified with its information. Should we add the check of volatile to icmp simplify?

I mean this transformation is correct. If the loaded pointer %4 is nonnull, it hits UB when calling _ZN1A5valueEv.

In actual testing, no matter if % 4 is set to null in a multithreaded environment, it seems that there is no running problem when calling _ZN1A5valueEv. Due to optimizations, %4 == null cannot be intercepted.

@dtcxzyw
Copy link
Member

dtcxzyw commented Jan 8, 2025

In actual testing, no matter if % 4 is set to null in a multithreaded environment, it seems that there is no running problem when calling _ZN1A5valueEv. Due to optimizations, %4 == null cannot be intercepted.

Can you share the C source code via godbolt?

@llvmssh
Copy link
Contributor Author

llvmssh commented Jan 8, 2025

In actual testing, no matter if % 4 is set to null in a multithreaded environment, it seems that there is no running problem when calling _ZN1A5valueEv. Due to optimizations, %4 == null cannot be intercepted.

Can you share the C source code via godbolt?
This is the source code link.: https://github.com/openjdk/jdk8u/blob/master/hotspot/src/share/vm/runtime/synchronizer.cpp
Function: ObjectSynchronizer::inflate

@dtcxzyw
Copy link
Member

dtcxzyw commented Jan 8, 2025

In actual testing, no matter if % 4 is set to null in a multithreaded environment, it seems that there is no running problem when calling _ZN1A5valueEv. Due to optimizations, %4 == null cannot be intercepted.

Can you share the C source code via godbolt?
This is the source code link.: https://github.com/openjdk/jdk8u/blob/master/hotspot/src/share/vm/runtime/synchronizer.cpp
Function: ObjectSynchronizer::inflate

Sorry, I don't have enough bandwidth to provide a reproducer. All I can tell is that EarlyCSE is correct in https://godbolt.org/z/chhxE3Txr. But I am not sure if the source code contains UB, or there is a pass that misoptimizes the program before EarlyCSE. You can rebuild hotspot with ubsan. After confirming that it is a compiler bug, please file an issue to track this.

@llvmssh
Copy link
Contributor Author

llvmssh commented Jan 8, 2025

You can rebuild hotspot with ubsan.

OK, thanks. Because the value of mark is accessed or assigned by multiple threads, where mark == 0, I understand that the optimization changes the code logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants