Add per tensor fp8 conv2d support #3315

jerryzh168 · 2025-11-08T00:34:04Z

Summary:
Add fp8 conv2d support, using the same conv3d kernels, by setting the D dimension to 1.

unsqueeze both input and weight in dim 2 ( the D dimension)
call fp8 conv3d op from fbgemm torch.ops.fbgemm.f8f8bf16_conv
assert D dimension shape to be 1 and call sequeeze at dim 2: res.squeeze(2) to remove the D dimension

Test Plan:

python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_unsqueeze_conv2d_weight 
python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_fp8_conv_variants

pytorch-bot · 2025-11-08T00:34:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3315

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (9 Unrelated Failures)

As of commit 22d7227 with merge base e8c4d09 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run 1xL4 Tests / test (SM-89, linux.g6.4xlarge.experimental.nvidia.gpu, --pre torch --index-url https://download.p... / linux-job (gh) (trunk failure)
test/float8/test_base.py::TestFloat8Linear::test_linear_from_recipe[linear_dtype2-False-x_shape2-Float8LinearRecipeName.ROWWISE_WITH_GW_HP]
Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh) (trunk failure)
test/test_low_bit_optim.py::TestOptim::test_param_groups_optim_name_AdamFp8_device_cpu
Run Regression Tests / test (CPU 2.7, linux.4xlarge, torch==2.7.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh) (trunk failure)
test/test_low_bit_optim.py::TestOptim::test_param_groups_optim_name_AdamFp8_device_cpu
Run Regression Tests / test (CPU 2.8, linux.4xlarge, torch==2.8.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh) (trunk failure)
test/test_low_bit_optim.py::TestOptim::test_param_groups_optim_name_AdamFp8_device_cpu
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh) (trunk failure)
test/float8/test_base.py::TestScaledMM::test_pad_inner_dim[False-base_dtype2]
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.0, cuda, 12.6) / linux-job (gh) (trunk failure)
test/float8/test_base.py::TestScaledMM::test_pad_inner_dim[False-base_dtype2]
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0, cuda, 12.6) / linux-job (gh) (trunk failure)
test/float8/test_base.py::TestScaledMM::test_pad_inner_dim[False-base_dtype2]
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh) (trunk failure)
test/test_low_bit_optim.py::TestOptim::test_param_groups_optim_name_AdamFp8_device_cpu
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) (trunk failure)
test/float8/test_base.py::TestScaledMM::test_pad_inner_dim[False-base_dtype2]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

andrewor14 · 2025-11-10T16:51:46Z

torchao/quantization/quantize_/workflows/float8/float8_tensor.py

+        padding = [0, *padding]
+        stride = [1, *stride]
+        dilation = [1, *dilation]
+        res = _quantize_and_scaled_conv3d(


is this unsqueezing to turn 2d into 3d? Does fbgemm only have 3d conv kernels?

yeah this turns 2d conv to 3d and it's because fbgemm only supports 3d conv right now

andrewor14 · 2025-11-10T16:53:22Z

torchao/quantization/quant_api.py


-    if weight.dim() == 5:
-        # weights for conv3d
+    if weight.dim() in [4, 5]:


is there anything more robust we can check here?

we can't really distinguish from here whether it's linear and conv weight I think (although right now seems linear is 2d/3d and conv is 4d/5d, maybe conv1d could have 3d weight, which is an overlap with linear)

but we could potentially separate the handling of conv and linear by passing around the module as well, if this is needed in the future

I can add a comment for now

andrewor14 · 2025-11-10T16:56:22Z

torchao/quantization/quantize_/workflows/float8/float8_tensor.py

+        "Please make sure both activation and weights are in the `channels_last` memory_format"
+    )
+    input_tensor = input_tensor.unsqueeze(2)
+    weight_tensor = weight_tensor.unsqueeze(2)


Can we reuse this code, e.g. have aten.conv2d call into aten.convolution?

yeah I think that should be possible for both aten.conv2d and aten.conv3d, I can refactor that in next PR?

Summary: Add fp8 conv2d support, using the same conv3d kernels, by setting the D dimension to 1. 1. unsqueeze both input and weight in dim 2 ( the D dimension) 2. call fp8 conv3d op from fbgemm `torch.ops.fbgemm.f8f8bf16_conv` 3. assert D dimension shape to be 1 and call sequeeze at dim 2: res.squeeze(2) to remove the D dimension Test Plan: python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_unsqueeze_conv2d_weight python test/quantization/quantize_/workflows/float8/test_float8_tensor.py -k test_fp8_conv_variants

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 8, 2025

jerryzh168 force-pushed the add-conv2d branch from 007ea02 to 8bf4032 Compare November 8, 2025 00:44

jerryzh168 added the topic: new feature Use this tag if this PR adds a new feature label Nov 8, 2025

jerryzh168 requested review from andrewor14, jcaip and vkuzo November 8, 2025 00:46

andrewor14 approved these changes Nov 10, 2025

View reviewed changes

jerryzh168 force-pushed the add-conv2d branch 2 times, most recently from fe50ba6 to 9cf37be Compare November 10, 2025 18:09

jerryzh168 force-pushed the add-conv2d branch from 9cf37be to 22d7227 Compare November 10, 2025 20:20

jerryzh168 merged commit df01de5 into pytorch:main Nov 10, 2025
9 of 18 checks passed

liangan1 mentioned this pull request Nov 11, 2025

[TorchAO] FP8 conv support intel/torch-xpu-ops#2324

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add per tensor fp8 conv2d support #3315

Add per tensor fp8 conv2d support #3315

jerryzh168 commented Nov 8, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 8, 2025 •

edited

Loading

Uh oh!

andrewor14 Nov 10, 2025

Uh oh!

jerryzh168 Nov 10, 2025

Uh oh!

andrewor14 Nov 10, 2025

Uh oh!

jerryzh168 Nov 10, 2025

Uh oh!

andrewor14 Nov 10, 2025

Uh oh!

jerryzh168 Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add per tensor fp8 conv2d support #3315

Add per tensor fp8 conv2d support #3315

Conversation

jerryzh168 commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3315

✅ You can merge normally! (9 Unrelated Failures)

Uh oh!

andrewor14 Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

andrewor14 Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

andrewor14 Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jerryzh168 commented Nov 8, 2025 •

edited

Loading

pytorch-bot bot commented Nov 8, 2025 •

edited

Loading