Skip to content

Commit cacfe10

Browse files
authored
Permute the pass pipeline to coalesce before setting up the matmul (#2956)
Required for #2834 Two reasons to do this - one, it properly tags the layouts with their memory order very early in the TTGIR pipeline. And two, it moves our TTGIR pipeline closer to upstream. I am splitting the change to isolate any regressions or undesired behavior caused by this change vs changing the DPAS layouts in #2834. cc #2354
1 parent ddecf19 commit cacfe10

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

third_party/intel/backend/compiler.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -239,15 +239,17 @@ def make_ttgir(mod, metadata, opt, properties):
239239
return XPUBackend.AdvancedPath.make_ttgir(mod, metadata, opt)
240240

241241
passes.ttir.add_convert_to_ttgpuir(pm, "xpu", opt.num_warps, opt.threads_per_warp, opt.num_ctas)
242+
# optimize TTGIR
243+
intel.passes.ttgpuir.add_coalesce(pm)
244+
intel.passes.ttgpuir.add_remove_layout_conversions(pm)
245+
242246
intel.passes.ttgpuir.add_accelerate_matmul(pm)
243247
intel.passes.ttgpuir.add_remove_layout_conversions(pm)
244248
intel.passes.ttgpuir.add_materialize_block_pointer(pm)
245249
if os.getenv("TRITON_INTEL_REWRITE_TENSOR_POINTER", "0") == "1":
246250
intel.passes.ttgpuir.add_rewrite_tensor_pointer(pm)
247251
intel.passes.ttgpuir.add_pipeline(pm, opt.num_stages, False)
248252

249-
intel.passes.ttgpuir.add_coalesce(pm)
250-
intel.passes.ttgpuir.add_remove_layout_conversions(pm)
251253
passes.ttgpuir.add_optimize_thread_locality(pm)
252254
passes.ttgpuir.add_optimize_dot_operands(pm, True)
253255
passes.common.add_cse(pm)

0 commit comments

Comments
 (0)