Skip to content

Commit ccfea26

Browse files
committed
update readme
Signed-off-by: He, Xin3 <xin3.he@intel.com>
1 parent 80f9b92 commit ccfea26

File tree

1 file changed

+5
-6
lines changed
  • examples/pytorch/nlp/huggingface_models/language-modeling/quantization/mix-precision

1 file changed

+5
-6
lines changed

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/mix-precision/README.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -110,16 +110,15 @@ Model with mixed precision is not supported in vLLM, but supported in transforme
110110
```bash
111111
# Command to save model:
112112
python quantize.py \
113-
--model_name_or_path /ssd/hf_models/Llama-3.3-70B-Instruct \
113+
--model_name_or_path meta-llama/Llama-3.1-8B-Instruct \
114114
--quantize \
115+
--iters 0 \
115116
--dtype MXFP4 \
116117
--use_recipe \
117-
--recipe_file recipes/Meta-Llama-3.3-70B-Instruct_5bits.json \
118+
--recipe_file recipes/Meta-Llama-3.1-8B-Instruct_7bits.json \
118119
--save \
119120
--save_format auto_round \
120-
--save_path Llama-3.3-70B-Instruct-MXFP4-MXFP8-AR \
121-
--enable_torch_compile
122-
121+
--save_path Llama-3.1-8B-Instruct-MXFP4-MXFP8-AR
123122
# Command to inference with transformer:
124-
python run_hf_inf.py Llama-3.3-70B-Instruct-MXFP4-MXFP8-AR
123+
python run_hf_inf.py Llama-3.1-8B-Instruct-MXFP4-MXFP8-AR
125124
```

0 commit comments

Comments
 (0)