Skip to content

Conversation

@VindhyaP312
Copy link

Fixes #116695

This patch ports the custom rem (remainder) DAG combine from the NVPTX backend (NVPTXISelLowering.cpp) into the generic DAGCombiner. The optimization is a CSE pattern that folds A % B into A - (A / B) * B if the quotient (A / B) is already computed.

This move allows all targets to benefit from the optimization and cleans up the NVPTX backend. The generic logic now includes an isIntDivCheap guard to prevent conflicts with target-specific division optimizations.

@github-actions
Copy link

github-actions bot commented Nov 8, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well labels Nov 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 8, 2025

@llvm/pr-subscribers-backend-nvptx

@llvm/pr-subscribers-llvm-selectiondag

Author: None (VindhyaP312)

Changes

Fixes #116695

This patch ports the custom rem (remainder) DAG combine from the NVPTX backend (NVPTXISelLowering.cpp) into the generic DAGCombiner. The optimization is a CSE pattern that folds A % B into A - (A / B) * B if the quotient (A / B) is already computed.

This move allows all targets to benefit from the optimization and cleans up the NVPTX backend. The generic logic now includes an isIntDivCheap guard to prevent conflicts with target-specific division optimizations.


Full diff: https://github.com/llvm/llvm-project/pull/167147.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+38)
  • (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (-34)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index f144f17d5a8f2..867f30985ad4e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -900,6 +900,41 @@ namespace {
                          ISD::NodeType ExtType);
   };
 
+/// Generic remainder optimization : Folds a remainder operation (A % B) by reusing the computed quotient (A / B).
+static SDValue PerformREMCombineGeneric(SDNode *N, DAGCombiner &DC,
+                                        CodeGenOptLevel OptLevel) {
+  assert(N->getOpcode() == ISD::SREM || N->getOpcode() == ISD::UREM);
+
+  // Don't do anything at less than -O2.
+  if (OptLevel < CodeGenOptLevel::Default)
+    return SDValue();
+
+  SelectionDAG &DAG = DC.getDAG();
+  SDLoc DL(N);
+  EVT VT = N->getValueType(0);
+  bool IsSigned = N->getOpcode() == ISD::SREM;
+  unsigned DivOpc = IsSigned ? ISD::SDIV : ISD::UDIV;
+
+  const SDValue &Num = N->getOperand(0);
+  const SDValue &Den = N->getOperand(1);
+  
+  AttributeList Attr = DC.getDAG().getMachineFunction().getFunction().getAttributes();
+  if (DC.getDAG().getTargetLoweringInfo().isIntDivCheap(N->getValueType(0), Attr))
+    return SDValue();
+
+  for (const SDNode *U : Num->users()) {
+    if (U->getOpcode() == DivOpc && U->getOperand(0) == Num &&
+        U->getOperand(1) == Den) {
+      // Num % Den -> Num - (Num / Den) * Den
+      return DAG.getNode(ISD::SUB, DL, VT, Num,
+                         DAG.getNode(ISD::MUL, DL, VT,
+                                     DAG.getNode(DivOpc, DL, VT, Num, Den),
+                                     Den));
+    }
+  }
+  return SDValue();
+}
+
 /// This class is a DAGUpdateListener that removes any deleted
 /// nodes from the worklist.
 class WorklistRemover : public SelectionDAG::DAGUpdateListener {
@@ -5400,6 +5435,9 @@ SDValue DAGCombiner::visitREM(SDNode *N) {
   if (SDValue NewSel = foldBinOpIntoSelect(N))
     return NewSel;
 
+  if (SDValue V = PerformREMCombineGeneric(N, *this, OptLevel))
+    return V;
+  
   if (isSigned) {
     // If we know the sign bits of both operands are zero, strength reduce to a
     // urem instead.  Handles (X & 0x0FFFFFFF) %s 16 -> X&15
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index a3deb36074e68..a3cbb09297f24 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -5726,37 +5726,6 @@ static SDValue PerformFMinMaxCombine(SDNode *N,
   return SDValue();
 }
 
-static SDValue PerformREMCombine(SDNode *N,
-                                 TargetLowering::DAGCombinerInfo &DCI,
-                                 CodeGenOptLevel OptLevel) {
-  assert(N->getOpcode() == ISD::SREM || N->getOpcode() == ISD::UREM);
-
-  // Don't do anything at less than -O2.
-  if (OptLevel < CodeGenOptLevel::Default)
-    return SDValue();
-
-  SelectionDAG &DAG = DCI.DAG;
-  SDLoc DL(N);
-  EVT VT = N->getValueType(0);
-  bool IsSigned = N->getOpcode() == ISD::SREM;
-  unsigned DivOpc = IsSigned ? ISD::SDIV : ISD::UDIV;
-
-  const SDValue &Num = N->getOperand(0);
-  const SDValue &Den = N->getOperand(1);
-
-  for (const SDNode *U : Num->users()) {
-    if (U->getOpcode() == DivOpc && U->getOperand(0) == Num &&
-        U->getOperand(1) == Den) {
-      // Num % Den -> Num - (Num / Den) * Den
-      return DAG.getNode(ISD::SUB, DL, VT, Num,
-                         DAG.getNode(ISD::MUL, DL, VT,
-                                     DAG.getNode(DivOpc, DL, VT, Num, Den),
-                                     Den));
-    }
-  }
-  return SDValue();
-}
-
 // (sign_extend|zero_extend (mul|shl) x, y) -> (mul.wide x, y)
 static SDValue combineMulWide(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
                               CodeGenOptLevel OptLevel) {
@@ -6428,9 +6397,6 @@ SDValue NVPTXTargetLowering::PerformDAGCombine(SDNode *N,
     return PerformSETCCCombine(N, DCI, STI.getSmVersion());
   case ISD::SHL:
     return PerformSHLCombine(N, DCI, OptLevel);
-  case ISD::SREM:
-  case ISD::UREM:
-    return PerformREMCombine(N, DCI, OptLevel);
   case ISD::STORE:
   case NVPTXISD::StoreV2:
   case NVPTXISD::StoreV4:

const SDValue &Num = N->getOperand(0);
const SDValue &Den = N->getOperand(1);

AttributeList Attr = DC.getDAG().getMachineFunction().getFunction().getAttributes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AttributeList Attr = DC.getDAG().getMachineFunction().getFunction().getAttributes();
AttributeList Attr = DAG.getMachineFunction().getFunction().getAttributes();

Comment on lines +930 to +932
DAG.getNode(ISD::MUL, DL, VT,
DAG.getNode(DivOpc, DL, VT, Num, Den),
Den));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use temporary variables to reduce ugly line wrap

Comment on lines +918 to +919
const SDValue &Num = N->getOperand(0);
const SDValue &Den = N->getOperand(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const SDValue &Num = N->getOperand(0);
const SDValue &Den = N->getOperand(1);
const SDValue Num = N->getOperand(0);
const SDValue Den = N->getOperand(1);

};

/// Generic remainder optimization : Folds a remainder operation (A % B) by reusing the computed quotient (A / B).
static SDValue PerformREMCombineGeneric(SDNode *N, DAGCombiner &DC,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static SDValue PerformREMCombineGeneric(SDNode *N, DAGCombiner &DC,
static SDValue performREMCombineGeneric(SDNode *N, DAGCombiner &DC,

};

/// Generic remainder optimization : Folds a remainder operation (A % B) by reusing the computed quotient (A / B).
static SDValue PerformREMCombineGeneric(SDNode *N, DAGCombiner &DC,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing in DAGCombiner is strange, pass in the DAG and TLI, or just move to be a DAGCombiner member?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:NVPTX llvm:SelectionDAG SelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[NVPTX] Port code to llvm/lib/CodeGen/SelectionDAG/*

4 participants