Skip to content

Commit a1b0b40

Browse files
committed
[None][fix] Update the attention layers counting for Qwen3-next.
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
1 parent 0ce22ce commit a1b0b40

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

tensorrt_llm/_torch/model_config.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -642,5 +642,12 @@ def get_layer_types(self) -> Optional[List[LayerTypeCpp]]:
642642
def get_num_attention_layers(self):
643643
if is_nemotron_hybrid(self.pretrained_config):
644644
return self.pretrained_config.hybrid_override_pattern.count("*")
645+
elif hasattr(self.pretrained_config, "architectures"
646+
) and self.pretrained_config.architectures[0] in [
647+
"Qwen3NextForCausalLM"
648+
]:
649+
# Qwen3NextForCausalLM has hybrid attention pattern(1:3 full attention:linear attention),
650+
# we need to calculate the number of fullattention layers
651+
return self.pretrained_config.num_hidden_layers // self.pretrained_config.full_attention_interval
645652
else:
646653
return self.pretrained_config.num_hidden_layers

0 commit comments

Comments
 (0)