fix(quantization): Skip weight initialization for quantized models

mauricioharley · mauricioharley · commit d006d8490126 · 2025-10-01T21:13:21.000-03:00
This commit addresses the RuntimeError encountered when loading llmcompressor W8A8 quantized models, where `torch.nn.init.normal_()` is called on `int8` tensors during weight initialization. The `_initialize_missing_keys` method in `modeling_utils.py` was unconditionally calling `self.initialize_weights()`. For quantized models, this initialization is unnecessary and causes a `RuntimeError` as `normal_()` does not support integer dtypes. By adding a check `if not is_quantized:` before calling `self.initialize_weights()`, we ensure that this problematic initialization step is skipped for quantized models, resolving the `RuntimeError` and improving compatibility with `llmcompressor` W8A8 models. Fixes #39366 Signed-off-by: Mauricio Harley <mauricioharley@gmail.com>
diff --git a/src/transformers/modeling_utils.py b/src/transformers/modeling_utils.py
@@ -5665,7 +5665,8 @@ def set_is_initialized_for_modules(module):
             with deepspeed.zero.GatheredParameters(not_initialized_parameters, modifier_rank=0):
                 self.initialize_weights()
         else:
-            self.initialize_weights()
+            if not is_quantized:
+                self.initialize_weights()
 
     def _adjust_missing_and_unexpected_keys(
         self, missing_keys: list[str], unexpected_keys: list[str], loading_task_model_from_base_state_dict: bool