Skip to content

Conversation

@whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Nov 17, 2025

This PR changes the Triton base from b3cf593 to 318fa9c (Nov 2).
Pass rate: 94.95%->98.1%

Paran0idy and others added 7 commits October 31, 2025 23:01
…ions (#8605)

Currently `async_wait` in Gluon on `CDNA4` requires the kernel writer to
pass the number of outstanding hardware instructions/llvm intrinsic to
`async_wait`. This count is very difficult to compute as it relies on
layouts, sizes, contiguity...

This PR changes the semantics of `async_wait` to represent the number of
outstanding commit groups. This follows the semantics used for nvidia in
Gluon. Therefore, Gluon kernels need to commit outstanding async
operations via `commit_group` and then wait on them via `wait_group`. I
also adapted the names so existing Gluon kernels using the old semantics
error out.

`UpdateAsyncWaitCount` is extended to compute the number of outstanding
hardware instructions based on the number of oustanding commits groups.
Previously, it only worked on `async_waits` carrying tokens of the
commit groups which are not available when compiling a Gluon kernel.
This is done by walking the IR backwards following *all* possible
control flow paths and finding the smallest number of emitted
instructions for N outstanding commit groups.
On GFX9, this PR lifts computations of `wave_id` to
the entry of the function and additionally emit
`lvm.amdgcn.readfirstlane`. This gives us optimized
code generation inside the loop.
This PR introduces a common interface for buffer ops.

---------

Co-authored-by: Alexander Efimov <efimov.alexander@gmail.com>
API doesn't accept scale for the intermediate tensor produced between
split_k and fused_scatter; this mode should therefore be disabled for
now.

Will be re-enabled after expert aggregation is moved out of the
matmul_ogs API
@whitneywhtsang whitneywhtsang self-assigned this Nov 17, 2025
Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>
@whitneywhtsang whitneywhtsang marked this pull request as ready for review November 18, 2025 05:05
@whitneywhtsang whitneywhtsang merged commit 97f32cc into main Nov 18, 2025
25 checks passed
@whitneywhtsang whitneywhtsang deleted the whitneywhtsang/merge branch November 18, 2025 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants