Skip to content

Conversation

@Himadhith
Copy link
Contributor

@Himadhith Himadhith commented Dec 10, 2025

Lockdown instructions for vector compares not equal to non-zero (Ex: vec[i]!=7). Current implementation can be made better by removing the negation and using the identity 0XFFFF + 1 = 0 and 0 + 1 = 0

@llvmbot
Copy link
Member

llvmbot commented Dec 10, 2025

@llvm/pr-subscribers-backend-powerpc

Author: None (Himadhith)

Changes

Lockdown instructions for vector compares not equal to non-zero. Current implementation can be made better by removing the negation and using the identity 0XFFFF + 1 = 0 and 0 + 1 = 0


Full diff: https://github.com/llvm/llvm-project/pull/171635.diff

1 Files Affected:

  • (added) llvm/test/CodeGen/PowerPC/optimize-vector-not-equal.ll (+71)
diff --git a/llvm/test/CodeGen/PowerPC/optimize-vector-not-equal.ll b/llvm/test/CodeGen/PowerPC/optimize-vector-not-equal.ll
new file mode 100644
index 0000000000000..c3bb2d5ecc461
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/optimize-vector-not-equal.ll
@@ -0,0 +1,71 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc64le-unknown-linux-gnu \
+; RUN:     -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC_64LE
+
+; RUN: llc -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc64-ibm-aix \
+; RUN:     -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC_64
+
+; RUN: llc -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc-ibm-aix \
+; RUN:     -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC_32
+
+; The current implementation is comparing vector of non-zeros in register v2 with v3. v3 is then negated and converts:
+; 0XFFFF -> 0
+; 0 -> 1
+; An optimized version is to follow this NFC patch
+
+define i32 @cols_needed(<4 x i16> %wide.load) {
+; POWERPC_64LE-LABEL: cols_needed:
+; POWERPC_64LE:       # %bb.0: # %entry
+; POWERPC_64LE-NEXT:    xxlxor v3, v3, v3
+; POWERPC_64LE-NEXT:    li r3, 0
+; POWERPC_64LE-NEXT:    vcmpequh v2, v2, v3
+; POWERPC_64LE-NEXT:    xxleqv v3, v3, v3
+; POWERPC_64LE-NEXT:    vmrglh v2, v2, v2
+; POWERPC_64LE-NEXT:    vsubuwm v2, v2, v3
+; POWERPC_64LE-NEXT:    xxswapd v3, v2
+; POWERPC_64LE-NEXT:    vadduwm v2, v2, v3
+; POWERPC_64LE-NEXT:    xxspltw v3, v2, 2
+; POWERPC_64LE-NEXT:    vadduwm v2, v2, v3
+; POWERPC_64LE-NEXT:    vextuwrx r3, r3, v2
+; POWERPC_64LE-NEXT:    blr
+;
+; POWERPC_64-LABEL: cols_needed:
+; POWERPC_64:       # %bb.0: # %entry
+; POWERPC_64-NEXT:    xxlxor v3, v3, v3
+; POWERPC_64-NEXT:    li r3, 0
+; POWERPC_64-NEXT:    vcmpequh v2, v2, v3
+; POWERPC_64-NEXT:    xxleqv v3, v3, v3
+; POWERPC_64-NEXT:    vmrghh v2, v2, v2
+; POWERPC_64-NEXT:    vsubuwm v2, v2, v3
+; POWERPC_64-NEXT:    xxswapd v3, v2
+; POWERPC_64-NEXT:    vadduwm v2, v2, v3
+; POWERPC_64-NEXT:    xxspltw v3, v2, 1
+; POWERPC_64-NEXT:    vadduwm v2, v2, v3
+; POWERPC_64-NEXT:    vextuwlx r3, r3, v2
+; POWERPC_64-NEXT:    blr
+;
+; POWERPC_32-LABEL: cols_needed:
+; POWERPC_32:       # %bb.0: # %entry
+; POWERPC_32-NEXT:    xxlxor v3, v3, v3
+; POWERPC_32-NEXT:    vcmpequh v2, v2, v3
+; POWERPC_32-NEXT:    xxleqv v3, v3, v3
+; POWERPC_32-NEXT:    vmrghh v2, v2, v2
+; POWERPC_32-NEXT:    vsubuwm v2, v2, v3
+; POWERPC_32-NEXT:    xxswapd v3, v2
+; POWERPC_32-NEXT:    vadduwm v2, v2, v3
+; POWERPC_32-NEXT:    xxspltw v3, v2, 1
+; POWERPC_32-NEXT:    vadduwm v2, v2, v3
+; POWERPC_32-NEXT:    stxv v2, -16(r1)
+; POWERPC_32-NEXT:    lwz r3, -16(r1)
+; POWERPC_32-NEXT:    blr
+entry:
+  %0 = icmp ne <4 x i16> %wide.load, zeroinitializer
+  %1 = zext <4 x i1> %0 to <4 x i32>
+  %2 = tail call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %1)
+  ret i32 %2
+}
+
+; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
+declare i32 @llvm.vector.reduce.add.v4i32(<4 x i32>) #0
+
+attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

@github-actions
Copy link

github-actions bot commented Dec 10, 2025

🐧 Linux x64 Test Results

  • 187330 tests passed
  • 4949 tests skipped

✅ The build succeeded and all tests passed.

@github-actions
Copy link

github-actions bot commented Dec 10, 2025

🪟 Windows x64 Test Results

  • 128590 tests passed
  • 2806 tests skipped

✅ The build succeeded and all tests passed.

@github-actions
Copy link

This is a comment for testing the issue write workflow

@github-actions
Copy link

This is another comment for testing the issue write workflow that was placed in a separate file

@Himadhith Himadhith force-pushed the NFC/VCMP_NE_NON_ZERO_OPT branch from a37d89d to d056c90 Compare December 11, 2025 05:15
Copy link
Contributor

@tonykuttai tonykuttai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Himadhith Himadhith merged commit c3e7a1a into llvm:main Dec 12, 2025
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 12, 2025

LLVM Buildbot has detected a new failure on builder clang-armv7-global-isel running on linaro-clang-armv7-global-isel while building llvm at step 7 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/39/builds/9073

Here is the relevant piece of the build log for the reference
Step 7 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'LLVM :: CodeGen/X86/2009-03-23-MultiUseSched.ll' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 3
/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/bin/llc < /home/tcwg-buildbot/worker/clang-armv7-global-isel/llvm/llvm/test/CodeGen/X86/2009-03-23-MultiUseSched.ll -mtriple=x86_64-linux -mcpu=corei7 -relocation-model=static | /home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv7-global-isel/llvm/llvm/test/CodeGen/X86/2009-03-23-MultiUseSched.ll
# executed command: /home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/bin/llc -mtriple=x86_64-linux -mcpu=corei7 -relocation-model=static
# .---command stderr------------
# | LLVM ERROR: out of memory
# | Allocation failed
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
# | Stack dump:
# | 0.	Program arguments: /home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/bin/llc -mtriple=x86_64-linux -mcpu=corei7 -relocation-model=static
# | 1.	Running pass 'Function Pass Manager' on module '<stdin>'.
# | 2.	Running pass 'X86 DAG->DAG Instruction Selection' on function '@foo'
# | #0 0x03e1eda0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/bin/llc+0x39ceda0)
# | #1 0x03e1c264 llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/bin/llc+0x39cc264)
# | #2 0x03e1fd60 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
# | #3 0xe853d6f0 __default_rt_sa_restorer ./signal/../sysdeps/unix/sysv/linux/arm/sigrestorer.S:80:0
# | #4 0xe852db06 ./csu/../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47:0
# | #5 0xe856d292 __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
# | #6 0xe853c840 gsignal ./signal/../sysdeps/posix/raise.c:27:6
# `-----------------------------
# error: command failed with exit status: -6
# executed command: /home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv7-global-isel/llvm/llvm/test/CodeGen/X86/2009-03-23-MultiUseSched.ll
# .---command stderr------------
# | FileCheck error: '<stdin>' is empty.
# | FileCheck command line:  /home/tcwg-buildbot/worker/clang-armv7-global-isel/stage1/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv7-global-isel/llvm/llvm/test/CodeGen/X86/2009-03-23-MultiUseSched.ll
# `-----------------------------
# error: command failed with exit status: 2

--

********************


anonymouspc pushed a commit to anonymouspc/llvm that referenced this pull request Dec 15, 2025
…tors (llvm#171635)

Lockdown instructions for vector compares `not equal to non-zero (Ex:
vec[i]!=7)`. Current implementation can be made better by removing the
negation and using the identity ``` 0XFFFF + 1 = 0 and 0 + 1 = 0 ```

Co-authored-by: himadhith <himadhith.v@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants