Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch #25110

laithsakka · 2025-09-17T23:14:49Z

Two main changes:

Add an option to not use the bytecode hook! torch2.8 have a way to drop guard without the hack.
We will switch totally to not use in a different PR after internal validation of perf post landing this.
Since @zou3519 have concerns about the perf. which i highly doubt that there will be issues.
TorchCompileWrapperWithCustomDipatch has complexities that are not used in the code base.
the current usage is to by pass guard evaluation so I am just introducing a much simpler TorchCompileWithNoGuardsWrapper.

Next PR I will add a debug mode option to keep DS gaurds and fail if get violated.

performance:

How does this effect run-time?
I ran the following benchmarks:

vllm bench latency
--model Qwen/Qwen2-1.5B-Instruct
--input-len 128
--output-len 256
--num-iters 50
--dtype float16

after:

10% percentile latency: 0.9403901444951771 seconds
25% percentile latency: 0.9434001832487411 seconds
50% percentile latency: 0.9502700129960431 seconds
75% percentile latency: 0.9560497467537061 seconds
90% percentile latency: 0.9588484280975536 seconds
99% percentile latency: 0.9626266451006813 seconds

before:

10% percentile latency: 0.9399352639040444 seconds
25% percentile latency: 0.9432627985042927 seconds
50% percentile latency: 0.9482277854986023 seconds
75% percentile latency: 0.9578490345011232 seconds
90% percentile latency: 0.9600405636985669 seconds
99% percentile latency: 0.9649963746999856 seconds

vllm bench throughput --model Qwen/Qwen2-1.5B-Instruct --input-len 512 --output-len 128 --num-prompts 1000

after

Throughput: 130.14 requests/s, 83113.04 total tokens/s, 16658.08 output tokens/s
Total num prompt tokens:  510637
Total num output tokens:  128000

before

Throughput: 129.54 requests/s, 82726.85 total tokens/s, 16580.68 output tokens/s
Total num prompt tokens:  510637
Total num output tokens:  128000

gemini-code-assist

Code Review

This pull request refactors the torch.compile wrapper to simplify its implementation. It removes the complex bytecode hooking mechanism in favor of the guard_filter_fn option available in newer PyTorch versions to drop guards. The new TorchCompileGuardsStripWrapper is much cleaner. The changes also include updating the support_torch_compile decorator and related tests to use the new wrapper. The overall change improves code clarity and maintainability. I've found one critical issue in the test suite that needs to be addressed.

tests/compile/test_wrapper.py

vllm/compilation/decorators.py

mergify · 2025-09-21T01:07:44Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @laithsakka.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

ProExpertProg · 2025-09-23T14:30:08Z

@bigPYJ1151 is planning to fix the CPU torch issue in vLLM so once that's done, we can upgrade to torch==2.8 everywhere and merge this PR

vllm/compilation/wrapper.py

ProExpertProg

Looks good, can you reformat?

vllm/compilation/wrapper.py

…patcher with TorchCompileGuardsStripWrapper Signed-off-by: Laith Sakka <lsakka@meta.com>

laithsakka · 2025-11-13T16:43:31Z

rebase again

laithsakka · 2025-11-14T21:34:15Z

@ProExpertProg can you help me land it before another rebase is needed :)

hmellor · 2025-11-15T13:08:02Z

FYI the docs build in this PR was failing so now it's failing on main too.

We're working on a fix.

ProExpertProg · 2025-11-15T14:16:57Z

Sorry about that, I looked at the error message and it looked unrelated, I should have checked nightly.

DarkLight1337 · 2025-11-15T14:30:03Z

Fixed by #28772

hmellor · 2025-11-15T16:16:32Z

It was an issue with mkdocstrings which I've reported to them.

For future reference, the docs are built for every commit on main. So if the docs are green on main, it's a real failure.

…vllm-project#25110) Signed-off-by: Laith Sakka <lsakka@meta.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>

…vllm-project#25110) Signed-off-by: Laith Sakka <lsakka@meta.com> Signed-off-by: Bram Wasti <bwasti@meta.com>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: shen-shanshan <467638484@qq.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com>

Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: shen-shanshan <467638484@qq.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Signed-off-by: Kurumi5210 <Jaychou1620@Gmail.com>

…vllm-project#25110) Signed-off-by: Laith Sakka <lsakka@meta.com>

laithsakka requested review from NickLucche, ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256, youkaichao and zou3519 as code owners September 17, 2025 23:14

mergify bot added v1 tpu Related to Google TPUs labels Sep 17, 2025

gemini-code-assist bot reviewed Sep 17, 2025

View reviewed changes

tests/compile/test_wrapper.py Outdated Show resolved Hide resolved

laithsakka marked this pull request as draft September 17, 2025 23:18

laithsakka force-pushed the something branch from 4c055dc to 42e4dd4 Compare September 17, 2025 23:44

mergify bot removed the tpu Related to Google TPUs label Sep 17, 2025

laithsakka force-pushed the something branch 3 times, most recently from 6f72573 to bcc0f99 Compare September 18, 2025 00:05

mergify bot added the tpu Related to Google TPUs label Sep 18, 2025

laithsakka marked this pull request as ready for review September 18, 2025 00:21

laithsakka commented Sep 19, 2025

View reviewed changes

vllm/compilation/decorators.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Sep 21, 2025

laithsakka mentioned this pull request Oct 3, 2025

Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. #26199

Merged

ProExpertProg reviewed Oct 7, 2025

View reviewed changes

vllm/compilation/wrapper.py Show resolved Hide resolved

ProExpertProg approved these changes Oct 7, 2025

View reviewed changes

laithsakka force-pushed the something branch from 1e420a3 to 3500329 Compare November 12, 2025 08:07

zou3519 self-requested a review November 12, 2025 14:39

zou3519 reviewed Nov 12, 2025

View reviewed changes

vllm/compilation/wrapper.py Show resolved Hide resolved

zou3519 reviewed Nov 12, 2025

View reviewed changes

vllm/compilation/wrapper.py Show resolved Hide resolved

laithsakka force-pushed the something branch from 3500329 to 84fbf7d Compare November 12, 2025 21:47

laithsakka requested review from DarkLight1337 and ywang96 as code owners November 12, 2025 21:47

mergify bot added multi-modality Related to multi-modality (#4194) qwen Related to Qwen models labels Nov 12, 2025

remove the bytecode hook and replace TorchCompileWrapperWithCustomDis…

84b93a5

…patcher with TorchCompileGuardsStripWrapper Signed-off-by: Laith Sakka <lsakka@meta.com>

laithsakka force-pushed the something branch from 84fbf7d to 84b93a5 Compare November 13, 2025 16:42

mergify bot removed the needs-rebase label Nov 13, 2025

laithsakka changed the title ~~Remove the bytecode hook and simplify TorchCompileWrapperWithCustomDipatch~~ Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch Nov 13, 2025

zou3519 approved these changes Nov 14, 2025

View reviewed changes

ProExpertProg approved these changes Nov 14, 2025

View reviewed changes

mgoin approved these changes Nov 14, 2025

View reviewed changes

vllm-bot merged commit 2e0ad62 into vllm-project:main Nov 14, 2025
53 of 54 checks passed

geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025

Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (…

c86f2f0

…vllm-project#25110) Signed-off-by: Laith Sakka <lsakka@meta.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>

bwasti pushed a commit to bwasti/vllm that referenced this pull request Nov 17, 2025

Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (…

f42e630

…vllm-project#25110) Signed-off-by: Laith Sakka <lsakka@meta.com> Signed-off-by: Bram Wasti <bwasti@meta.com>

This was referenced Nov 25, 2025

upgrade torch npu version vllm-project/vllm-ascend#4433

Open

upgrade to vllm 0.11.2 vllm-project/vllm-ascend#4400

Merged

bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025

Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (…

33559cf

…vllm-project#25110) Signed-off-by: Laith Sakka <lsakka@meta.com>

Uh oh!

Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch #25110

Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch #25110

Conversation

laithsakka commented Sep 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

performance:

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Sep 21, 2025

Uh oh!

ProExpertProg commented Sep 23, 2025

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

laithsakka commented Nov 13, 2025

Uh oh!

laithsakka commented Nov 14, 2025

Uh oh!

Uh oh!

hmellor commented Nov 15, 2025

Uh oh!

ProExpertProg commented Nov 15, 2025

Uh oh!

DarkLight1337 commented Nov 15, 2025

Uh oh!

hmellor commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

laithsakka commented Sep 17, 2025 •

edited by github-actions bot

Loading