fix(quantization): Skip weight initialization for quantized models #41273
+56
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This Pull Request fixes a
RuntimeErrorthat occurs when loadingllmcompressorW8A8 quantized models (e.g.,RedHatAI/Qwen2.5-VL-7B-Instruct-quantized.w8a8) due to an attempt to initializeint8weights usingtorch.nn.init.normal_(), which only supports floating-point dtypes.The issue was identified in
modeling_utils.pywithin the_initialize_missing_keysmethod. Whenis_quantizedisTrue, theelsebranch was still callingself.initialize_weights(), leading to theRuntimeError.Proposed Change
Added a conditional check
if not is_quantized:before the call toself.initialize_weights()in theelsebranch of the_initialize_missing_keysmethod. This ensures that weight initialization is skipped for quantized models, as their weights are either already defined or will be loaded from a pretrained state dictionary, making the initialization redundant and problematic.Related Issue
Closes #39366