Skip to content

Commit 98c088c

Browse files
committed
[None][fix] Update the attention layers counting for Qwen3-next.
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
1 parent 264d38e commit 98c088c

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

tensorrt_llm/_torch/model_config.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -642,5 +642,11 @@ def get_layer_types(self) -> Optional[List[LayerTypeCpp]]:
642642
def get_num_attention_layers(self):
643643
if is_nemotron_hybrid(self.pretrained_config):
644644
return self.pretrained_config.hybrid_override_pattern.count("*")
645+
elif self.pretrained_config.architectures[0] in [
646+
"Qwen3NextForCausalLM"
647+
]:
648+
# Qwen3NextForCausalLM has hybrid attention pattern(1:3 full attention:linear attention),
649+
# we need to calculate the number of fullattention layers
650+
return self.pretrained_config.num_hidden_layers // self.pretrained_config.full_attention_interval
645651
else:
646652
return self.pretrained_config.num_hidden_layers

0 commit comments

Comments
 (0)