Commit 4cfc0e6
ignore _update_mamba_mask for AWQ sequential tracing (#1925)
SUMMARY:
In models with mamba-2 layers e.g.,
[nvidia/NVIDIA-Nemotron-Nano-12B-v2](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2),
[Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct),
tracing _update_mamba_masks would lead to
```
File "NemotronHModel_8045287568680_autowrapped", line 57, in forward
File "/mnt/LinuxDrive/huggingface/modules/transformers_modules/NVIDIA_hyphen_Nemotron_hyphen_Nano_hyphen_12B_hyphen_v2/modeling_nemotron_h.py", line 1461, in _update_mamba_mask
if cache_position[0] > 0 or (attention_mask is not None and torch.all(attention_mask == 1)):
^^^^^^^^^^^^^^^^^^^^^
File "/home/toncao/anaconda3/envs/llm-compressor_v1/lib/python3.12/site-packages/transformers/utils/fx.py", line 674, in __bool__
return super().__bool__()
^^^^^^^^^^^^^^^^^^
File "/home/toncao/anaconda3/envs/llm-compressor_v1/lib/python3.12/site-packages/torch/fx/proxy.py", line 577, in __bool__
return self.tracer.to_bool(self)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/toncao/anaconda3/envs/llm-compressor_v1/lib/python3.12/site-packages/torch/fx/proxy.py", line 388, in to_bool
raise TraceError(
torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow
```
from the function:
```
def _update_mamba_mask(self, attention_mask, cache_position):
--
"""
No need for zeroing states when
1. Cached forward
2. Attending to all inputs
"""
mamba_mask = attention_mask
if cache_position[0] > 0 or (attention_mask is not None and torch.all(attention_mask == 1)):
mamba_mask = None
return mamba_mask
```
And thus, adding _update_mamba_masks to the ignore tracing list makes
AWQ sequential tracing works.
TEST PLAN:
local make test results:
```
===================================================== short test summary info =====================================================
FAILED tests/llmcompressor/modeling/test_calib_deepseek_v3.py::test_calib_deepseekv3_module - torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 56.00 MiB. GPU 0 has a total capacity of 23.57 GiB of which 14.1...
FAILED tests/llmcompressor/utils/test_helpers.py::test_disable_cache[MllamaForConditionalGeneration-meta-llama/Llama-3.2-11B-Vision-Instruct] - huggingface_hub.errors.GatedRepoError: 403 Client Error. (Request ID: Root=1-68ee275c-378c35b1649b823602164fc0;24ebe331-9031-4...
FAILED tests/lmeval/test_lmeval.py::TestLMEval::test_lm_eval[None] - TypeError: argument should be a str or an os.PathLike object where __fspath__ returns a str, not 'NoneType'
====================================== 3 failed, 242 passed, 4 skipped in 129.47s (0:02:09) =======================================
```
Co-authored-by: toncao <cpatonn@gmail.com>
Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>1 parent d7d1b45 commit 4cfc0e6
1 file changed
+1
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
194 | 194 | | |
195 | 195 | | |
196 | 196 | | |
| 197 | + | |
197 | 198 | | |
198 | 199 | | |
199 | 200 | | |
| |||
0 commit comments