-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch #25110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the torch.compile wrapper to simplify its implementation. It removes the complex bytecode hooking mechanism in favor of the guard_filter_fn option available in newer PyTorch versions to drop guards. The new TorchCompileGuardsStripWrapper is much cleaner. The changes also include updating the support_torch_compile decorator and related tests to use the new wrapper. The overall change improves code clarity and maintainability. I've found one critical issue in the test suite that needs to be addressed.
4c055dc to
42e4dd4
Compare
6f72573 to
bcc0f99
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
|
@bigPYJ1151 is planning to fix the CPU torch issue in vLLM so once that's done, we can upgrade to torch==2.8 everywhere and merge this PR |
ProExpertProg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, can you reformat?
1e420a3 to
3500329
Compare
3500329 to
84fbf7d
Compare
…patcher with TorchCompileGuardsStripWrapper Signed-off-by: Laith Sakka <lsakka@meta.com>
84fbf7d to
84b93a5
Compare
|
rebase again |
|
@ProExpertProg can you help me land it before another rebase is needed :) |
|
FYI the docs build in this PR was failing so now it's failing on We're working on a fix. |
|
Sorry about that, I looked at the error message and it looked unrelated, I should have checked nightly. |
|
Fixed by #28772 |
|
It was an issue with mkdocstrings which I've reported to them. For future reference, the docs are built for every commit on |
…vllm-project#25110) Signed-off-by: Laith Sakka <lsakka@meta.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>
…vllm-project#25110) Signed-off-by: Laith Sakka <lsakka@meta.com> Signed-off-by: Bram Wasti <bwasti@meta.com>
Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: shen-shanshan <467638484@qq.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com>
Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: shen-shanshan <467638484@qq.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: leo-pony <nengjunma@outlook.com> Signed-off-by: Kurumi5210 <Jaychou1620@Gmail.com>
…vllm-project#25110) Signed-off-by: Laith Sakka <lsakka@meta.com>
Two main changes:
Add an option to not use the bytecode hook! torch2.8 have a way to drop guard without the hack.
We will switch totally to not use in a different PR after internal validation of perf post landing this.
Since @zou3519 have concerns about the perf. which i highly doubt that there will be issues.
TorchCompileWrapperWithCustomDipatch has complexities that are not used in the code base.
the current usage is to by pass guard evaluation so I am just introducing a much simpler TorchCompileWithNoGuardsWrapper.
Next PR I will add a debug mode option to keep DS gaurds and fail if get violated.
performance:
How does this effect run-time?
I ran the following benchmarks:
vllm bench latency
--model Qwen/Qwen2-1.5B-Instruct
--input-len 128
--output-len 256
--num-iters 50
--dtype float16
after:
before:
vllm bench throughput --model Qwen/Qwen2-1.5B-Instruct --input-len 512 --output-len 128 --num-prompts 1000
after
before