-
Notifications
You must be signed in to change notification settings - Fork 75
[intel] improve pitch and width constexpr folding #5489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3102d07 to
0806403
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves constant expression folding for pitch and width parameters in Intel GPU block I/O operations. The changes introduce a more robust constant evaluation mechanism that handles multiple levels of type casts and operation folding, addressing issue #5338.
- Adds a new
tryConstEvalfunction that recursively evaluates constants through casts and foldable operations - Replaces the previous
skipTrunc-based pitch validation with the new evaluation approach - Adds width validation using the same constant evaluation mechanism
- Updates test kernel to mark stride parameters as
constexprto enable better compile-time evaluation
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp | Implements new constant evaluation helpers and applies them to pitch/width validation in block I/O conversion |
| python/test/unit/language/test_block_pointer.py | Marks stride parameters as constexpr in test kernel to support improved constant folding |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (succeeded(def->fold(results)) && results.size() == 1) { | ||
| if (auto val = dyn_cast_or_null<Value>(results[0])) | ||
| return val; | ||
| } |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fold result could be an Attribute containing a constant value, not just a Value. If results[0] is an Attribute, it should be converted to a constant Value rather than being ignored. This would cause tryConstEval to miss foldable constant attributes.
| M, N, K, # | ||
| stride_am, stride_ak, # | ||
| stride_bk, stride_bn, # | ||
| stride_am: tl.constexpr, stride_ak: tl.constexpr, # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this changes the test. What happens if these arguments are not constexpr, does the test still fails ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, because it resolves finally to %arg, so it's still unknown at compile time and later in runtime it's 32, so it's not valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not change this existing test case, and add one test under Intel folder with TRITON_INTEL_2DBLOCK_ASSERT set in the file.
0806403 to
c1803e9
Compare
| return v; | ||
| } | ||
|
|
||
| static std::optional<int64_t> tryConstEval(Value v, int depth = 16) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we port these changes to Utility.cpp and improve mlir::triton::intel::getFoldedConstantValue??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will increase work done while compilation because getFoldedConstantValue is used in isConstant in many places and my version does multiple skips and folds instead of single fold, however IDK if it's a problem and how obsessive we are about cycles at this step.
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
Outdated
Show resolved
Hide resolved
| M, N, K, # | ||
| stride_am, stride_ak, # | ||
| stride_bk, stride_bn, # | ||
| stride_am: tl.constexpr, stride_ak: tl.constexpr, # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not change this existing test case, and add one test under Intel folder with TRITON_INTEL_2DBLOCK_ASSERT set in the file.
Co-authored-by: Whitney Tsang <whitney.tsang@intel.com>
Co-authored-by: Whitney Tsang <whitney.tsang@intel.com>
This PR improves constant expression folding for pitch and width parameters in Intel GPU block I/O operations. The changes introduce a more robust constant evaluation mechanism that handles multiple levels of type casts and operation folding, addressing issue #5338.