Skip to content

Commit b645db6

Browse files
committed
fix llmc example for llama 4
Summary: need to enable calibration to get the experts to quantize Test Plan: Reviewers: Subscribers: Tasks: Tags:
1 parent 9de0416 commit b645db6

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

hf_torchao_vllm/quantize_hf_model_with_llm_compressor.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import fire
44
from datasets import load_dataset
55
from llmcompressor import oneshot
6+
from llmcompressor.modeling import replace_modules_for_calibration
67
from llmcompressor.modifiers.quantization import QuantizationModifier
78
from llmcompressor.utils import dispatch_for_generation
89

@@ -17,6 +18,7 @@ def run(
1718

1819
# Load model.
1920
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")
21+
model = replace_modules_for_calibration(model)
2022
print(model)
2123
tokenizer = AutoTokenizer.from_pretrained(model_name)
2224

0 commit comments

Comments
 (0)