Skip to content

Commit e0f6965

Browse files
authored
[None][fix] Update the attention layers counting for Qwen3-next. (#9072)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
1 parent 2854f0c commit e0f6965

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

tensorrt_llm/_torch/model_config.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -605,5 +605,12 @@ def get_layer_types(self) -> Optional[List[LayerTypeCpp]]:
605605
def get_num_attention_layers(self):
606606
if is_nemotron_hybrid(self.pretrained_config):
607607
return self.pretrained_config.hybrid_override_pattern.count("*")
608+
elif hasattr(
609+
self.pretrained_config, "architectures"
610+
) and self.pretrained_config.architectures is not None and self.pretrained_config.architectures[
611+
0] in ["Qwen3NextForCausalLM"]:
612+
# Qwen3NextForCausalLM has hybrid attention pattern(1:3 full attention:linear attention),
613+
# we need to calculate the number of fullattention layers
614+
return self.pretrained_config.num_hidden_layers // self.pretrained_config.full_attention_interval
608615
else:
609616
return self.pretrained_config.num_hidden_layers

0 commit comments

Comments
 (0)