Use matmul fwd direclty in autograd for performance #1045

tianrengao · 2025-10-28T19:53:36Z

Resolving previous matmul bwd implementation for performance concern. In previous PR #748, the matmul bwd was implemented with a specific kernel via two passes, while we can directly call matmul fwd twice given matmul fwd is fully optimized, as @ngimel pointed out. In this PR the matmul_autograd and addmm_autograd are updated to use two matmul fwds instead of a specific matmul_bwd and addmm_bwd. The benchmark/run.py now only calls these updated bwd.

However, the @helion.kernel annotation does not allow calling another function(matmul_fwd) within the kernel def, so the original matmul_bwd and addmm_bwd are still preserved only as examples in examples/matmul.py, but they are not actually used in benchmark run.

jansel · 2025-11-07T02:30:21Z

examples/matmul.py

+        # grad_mat1 = grad_out @ mat2.T
+        grad_mat1 = matmul(grad_out, mat2.T)
+
+        # grad_mat2 = mat1.T @ grad_out
+        grad_mat2 = matmul(mat1.T, grad_out)


You only need to compute these if requires_grad is set on the inputs.

jansel · 2025-11-07T02:31:31Z

examples/matmul.py

+        # grad_bias = beta * grad_out
+        grad_bias = beta * grad_out
+
+        # grad_mat1 = alpha * (grad_out @ mat2.T)
+        grad_mat1 = alpha * matmul(grad_out, mat2.T)
+
+        # grad_mat2 = alpha * (mat1.T @ grad_out)
+        grad_mat2 = alpha * matmul(mat1.T, grad_out)


This results in extra kernels, you should define an epilogue function to put the scaling into the matmul kernel.

Also same issue as above.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 28, 2025

tianrengao added 2 commits November 4, 2025 21:00

use matmul direclty in bwd for performance

95b2255

add test

ad73fba

tianrengao force-pushed the tianren/addmm_bwd_fix_impl branch from 7301838 to ad73fba Compare November 5, 2025 05:02

tianrengao and others added 3 commits November 4, 2025 22:07

revert example

02862f6

fix lint

21f56a1

Merge branch 'main' into tianren/addmm_bwd_fix_impl

f86fd9d

tianrengao marked this pull request as ready for review November 5, 2025 18:47

tianrengao requested review from ngimel and yf225 November 5, 2025 18:52

tianrengao changed the title ~~use matmul direclty in bwd for performance~~ Use matmul fwd direclty in autograd for performance Nov 5, 2025

jansel requested changes Nov 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use matmul fwd direclty in autograd for performance #1045

Use matmul fwd direclty in autograd for performance #1045

Uh oh!

tianrengao commented Oct 28, 2025 •

edited

Loading

Uh oh!

jansel Nov 7, 2025

Uh oh!

jansel Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use matmul fwd direclty in autograd for performance #1045

Are you sure you want to change the base?

Use matmul fwd direclty in autograd for performance #1045

Uh oh!

Conversation

tianrengao commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jansel Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

jansel Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tianrengao commented Oct 28, 2025 •

edited

Loading