Skip to content

Commit 7df0289

Browse files
mgoinyewentao256
andauthored
Change warning logs to debug for unimplemented MXFP4 Linear/Attention (#29441)
Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
1 parent 0abc794 commit 7df0289

File tree

1 file changed

+6
-4
lines changed
  • vllm/model_executor/layers/quantization

1 file changed

+6
-4
lines changed

vllm/model_executor/layers/quantization/mxfp4.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -196,9 +196,10 @@ def get_quant_method(
196196
# TODO: Add support for MXFP4 Linear Method.
197197
# MXFP4 LinearMethod is available in AMD-Quark, refer to that implementation
198198
# if you are interested in enabling MXFP4 here.
199-
logger.warning_once(
199+
logger.debug_once(
200200
"MXFP4 linear layer is not implemented - falling back to "
201-
"UnquantizedLinearMethod."
201+
"UnquantizedLinearMethod.",
202+
scope="local",
202203
)
203204
return UnquantizedLinearMethod()
204205
elif isinstance(layer, FusedMoE):
@@ -208,9 +209,10 @@ def get_quant_method(
208209
return Mxfp4MoEMethod(layer.moe_config)
209210
elif isinstance(layer, Attention):
210211
# TODO: Add support for MXFP4 Attention.
211-
logger.warning_once(
212+
logger.debug_once(
212213
"MXFP4 attention layer is not implemented. "
213-
"Skipping quantization for this layer."
214+
"Skipping quantization for this layer.",
215+
scope="local",
214216
)
215217
return None
216218

0 commit comments

Comments
 (0)