Commit 9117748
[AMD] Bypass analysis to use buffer ops for tt.pointer_range=32 tensors (#8302)
This change, combined with the previous PR#7939, is to convert memory
accesses of small-tensors into buffer-ops *WITHOUT* analyzing their
offsets' value range.
* Some context
- We informally call a tensor "small-tensor" if it is no more than 2G
byte in size.
- a jit function is specialized to differentiate "small-tensor" and
larger ones; a function formal argument is tagged with
tt.pointer_range=32 if it binds to a small-tensor.
* This Change
- This change, combined with the previous PR#7939, is to convert memory
accesses of small-tensors into buffer-ops *WITHOUT* analyzing their
offsets' value range.
- The contribution of PR#7939 is to reveal the base-pointer, and this PR
is to unconditionally perform such conversion.
- It side-steps the defect/limitation of offset-range-analysis, and it's
safe!
* TODO
- If the offset of the mem-op of small-tensor is 64-bit quantity, we can
cast the offset to 32-bit and then convert it.
* Option
- Pass option: `analyzeSmallTensorOfst={false|true}`, false by default,
meaning when coming across mem-op of small-tensor, no need to analyze
its offset's value-range.
---------
Co-authored-by: Shuxin Yang <Shuxin.Yang@gmail.com>1 parent b50872a commit 9117748
File tree
4 files changed
+728
-23
lines changed- test/TritonGPU/amd
- third_party/amd
- include/TritonAMDGPUTransforms
- lib/TritonAMDGPUTransforms
4 files changed
+728
-23
lines changed
0 commit comments