We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent c924f65 commit 3dcb958Copy full SHA for 3dcb958
hf_torchao_vllm/README.md
@@ -0,0 +1,11 @@
1
+# HF -> torchao -> vLLM convenience scripts
2
+
3
+Example
4
5
+```bash
6
+# save a quantized model ot data/nvfp4-Qwen1.5-MoE-A2.7B
7
+python quantize_hf_model_with_torchao.py --model_name "Qwen/Qwen1.5-MoE-A2.7B" --experts_only_qwen_1_5_moe_a_2_7b True --save_model_to_disk True --quant_type nvfp4
8
9
+# run the model from above in vLLM
10
+python run_quantized_model_in_vllm.py --model_name "data/nvfp4-Qwen1.5-MoE-A2.7B" --compile False
11
+```
0 commit comments