Skip to content

ValueError: Unrecognized configuration class <class 'transformers.models.qwen3_omni_moe.configuration_qwen3_omni_moe.Qwen3OmniMoeConfig'> for this kind of AutoModel: AutoModel. #42032

@Tortoise17

Description

@Tortoise17

System Info

I have started testing the Qwen3-Omni model and at that time there was transformers version 4.56.0 available which had the issues to the model. With the commits and bugs fixation for transformers version 4.57.0 it got fixed but that commit was available on git. Since there is transformer update on the pip and also on the git, there is again issues to run the model. The issues are mentioned below.
I have tried all the versions and all have same like issues same error while loading.

Since there is update in the transformer version, there is error which is given below when we load the model:

`Starting to load model ../Qwen3-Omni-30B-A3B-Thinking...
(VllmWorkerProcess pid=781008) WARNING 11-05 11:35:04 [utils.py:196] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
(VllmWorkerProcess pid=781008) INFO 11-05 11:35:04 [transformers.py:400] Using Transformers backend.
(VllmWorkerProcess pid=781007) WARNING 11-05 11:35:04 [utils.py:196] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
(VllmWorkerProcess pid=781007) INFO 11-05 11:35:04 [transformers.py:400] Using Transformers backend.
(VllmWorkerProcess pid=781006) WARNING 11-05 11:35:04 [utils.py:196] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
(VllmWorkerProcess pid=781006) INFO 11-05 11:35:04 [transformers.py:400] Using Transformers backend.
WARNING 11-05 11:35:04 [utils.py:196] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
INFO 11-05 11:35:04 [transformers.py:400] Using Transformers backend.
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] Exception in worker VllmWorkerProcess while processing method load_model.
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] Traceback (most recent call last):
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/executor/multiproc_worker_utils.py", line 226, in _run_worker_process
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/utils/init.py", line 3007, in run_method
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] return func(*args, **kwargs)
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/worker/worker.py", line 211, in load_model
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] self.model_runner.load_model()
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/worker/model_runner.py", line 1083, in load_model
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] self.model = get_model(vllm_config=self.vllm_config)
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/model_executor/model_loader/init.py", line 118, in get_model
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] return loader.load_model(vllm_config=vllm_config,
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/model_executor/model_loader/base_loader.py", line 44, in load_model
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] model = initialize_model(vllm_config=vllm_config,
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] return model_class(vllm_config=vllm_config, prefix=prefix)
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/model_executor/models/transformers.py", line 737, in init
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] super().init(vllm_config=vllm_config, prefix=prefix)
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/compilation/decorators.py", line 183, in init
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/model_executor/models/transformers.py", line 661, in init
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] super().init(vllm_config=vllm_config, prefix=prefix)
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/model_executor/models/transformers.py", line 423, in init
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] self.model: PreTrainedModel = AutoModel.from_config(
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/anaconda3/envs/vis3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 458, in from_config
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] raise ValueError(
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] ValueError: Unrecognized configuration class <class 'transformers.models.qwen3_omni_moe.configuration_qwen3_omni_moe.Qwen3OmniMoeConfig'> for this kind of AutoModel: AutoModel.
(VllmWorkerProcess pid=781007) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] Model type should be one of Aimv2Config, Aimv2VisionConfig, AlbertConfig, AlignConfig, AltCLIPConfig, ApertusConfig, ArceeConfig, AriaConfig, AriaTextConfig, ASTConfig, AutoformerConfig, AyaVisionConfig, BambaConfig, BarkConfig, BartConfig, BeitConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BitConfig, BitNetConfig, BlenderbotConfig, BlenderbotSmallConfig, BlipConfig, Blip2Config, Blip2QFormerConfig, BloomConfig, BltConfig, BridgeTowerConfig, BrosConfig, CamembertConfig, CanineConfig, ChameleonConfig, ChineseCLIPConfig, ChineseCLIPVisionConfig, ClapConfig, CLIPConfig, CLIPTextConfig, CLIPVisionConfig, CLIPSegConfig, ClvpConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, Cohere2VisionConfig, ConditionalDetrConfig, ConvBertConfig, ConvNextConfig, ConvNextV2Config, CpmAntConfig, CsmConfig, CTRLConfig, CvtConfig, DFineConfig, DabDetrConfig, DacConfig, Data2VecAudioConfig, Data2VecTextConfig, Data2VecVisionConfig, DbrxConfig, DebertaConfig, DebertaV2Config, DecisionTransformerConfig, DeepseekV2Config, DeepseekV3Config, DeepseekVLConfig, DeepseekVLHybridConfig, DeformableDetrConfig, DeiTConfig, DepthProConfig, DetaConfig, DetrConfig, DiaConfig, DiffLlamaConfig, DinatConfig, Dinov2Config, Dinov2WithRegistersConfig, DINOv3ConvNextConfig, DINOv3ViTConfig, DistilBertConfig, DogeConfig, DonutSwinConfig, Dots1Config, DPRConfig, DPTConfig, EdgeTamConfig, EdgeTamVideoConfig, EdgeTamVisionConfig, EfficientFormerConfig, EfficientLoFTRConfig, EfficientNetConfig, ElectraConfig, Emu3Config, EncodecConfig, ErnieConfig, Ernie4_5Config, Ernie4_5_MoeConfig, ErnieMConfig, EsmConfig, EvollaConfig, Exaone4Config, FalconConfig, FalconH1Config, FalconMambaConfig, FastSpeech2ConformerConfig, FastSpeech2ConformerWithHifiGanConfig, FlaubertConfig, FlavaConfig, FlexOlmoConfig, Florence2Config, FNetConfig, FocalNetConfig, FSMTConfig, FunnelConfig, FuyuConfig, GemmaConfig, Gemma2Config, Gemma3Config, Gemma3TextConfig, Gemma3nConfig, Gemma3nAudioConfig, Gemma3nTextConfig, Gemma3nVisionConfig, GitConfig, GlmConfig, Glm4Config, Glm4MoeConfig, Glm4vConfig, Glm4vMoeConfig, Glm4vMoeTextConfig, Glm4vTextConfig, GLPNConfig, GotOcr2Config, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GptOssConfig, GPTJConfig, GPTSanJapaneseConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeHybridConfig, GraniteMoeSharedConfig, GraphormerConfig, GroundingDinoConfig, GroupViTConfig, HeliumConfig, HGNetV2Config, HieraConfig, HubertConfig, HunYuanDenseV1Config, HunYuanMoEV1Config, IBertConfig, IdeficsConfig, Idefics2Config, Idefics3Config, Idefics3VisionConfig, IJepaConfig, ImageGPTConfig, InformerConfig, InstructBlipConfig, InstructBlipVideoConfig, InternVLConfig, InternVLVisionConfig, JambaConfig, JanusConfig, JetMoeConfig, JukeboxConfig, Kosmos2Config, Kosmos2_5Config, KyutaiSpeechToTextConfig, LayoutLMConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LevitConfig, Lfm2Config, Lfm2VlConfig, LightGlueConfig, LiltConfig, LlamaConfig, Llama4Config, Llama4TextConfig, LlavaConfig, LlavaNextConfig, LlavaNextVideoConfig, LlavaOnevisionConfig, LongcatFlashConfig, LongformerConfig, LongT5Config, LukeConfig, LxmertConfig, M2M100Config, MambaConfig, Mamba2Config, MarianConfig, MarkupLMConfig, Mask2FormerConfig, MaskFormerConfig, MaskFormerSwinConfig, MBartConfig, MCTCTConfig, MegaConfig, MegatronBertConfig, MetaClip2Config, MgpstrConfig, MimiConfig, MiniMaxConfig, MinistralConfig, MistralConfig, Mistral3Config, MixtralConfig, MLCDVisionConfig, MllamaConfig, MMGroundingDinoConfig, MobileBertConfig, MobileNetV1Config, MobileNetV2Config, MobileViTConfig, MobileViTV2Config, ModernBertConfig, ModernBertDecoderConfig, MoonshineConfig, MoshiConfig, MPNetConfig, MptConfig, MraConfig, MT5Config, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NatConfig, NemotronConfig, NezhaConfig, NllbMoeConfig, NystromformerConfig, OlmoConfig, Olmo2Config, Olmo3Config, OlmoeConfig, OmDetTurboConfig, OneFormerConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, Ovis2Config, Owlv2Config, OwlViTConfig, PaliGemmaConfig, ParakeetCTCConfig, ParakeetEncoderConfig, PatchTSMixerConfig, PatchTSTConfig, PegasusConfig, PegasusXConfig, PerceiverConfig, TimmWrapperConfig, PerceptionLMConfig, PersimmonConfig, PhiConfig, Phi3Config, Phi4MultimodalConfig, PhimoeConfig, PixtralVisionConfig, PLBartConfig, PoolFormerConfig, ProphetNetConfig, PvtConfig, PvtV2Config, QDQBertConfig, Qwen2Config, Qwen2_5_VLConfig, Qwen2_5_VLTextConfig, Qwen2AudioEncoderConfig, Qwen2MoeConfig, Qwen2VLConfig, Qwen2VLTextConfig, Qwen3Config, Qwen3MoeConfig, Qwen3NextConfig, Qwen3VLConfig, Qwen3VLMoeConfig, Qwen3VLMoeTextConfig, Qwen3VLTextConfig, RecurrentGemmaConfig, ReformerConfig, RegNetConfig, RemBertConfig, ResNetConfig, RetriBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RTDetrConfig, RTDetrV2Config, RwkvConfig, SamConfig, Sam2Config, Sam2HieraDetConfig, Sam2VideoConfig, Sam2VisionConfig, SamHQConfig, SamHQVisionConfig, SamVisionConfig, SeamlessM4TConfig, SeamlessM4Tv2Config, SeedOssConfig, SegformerConfig, SegGptConfig, SEWConfig, SEWDConfig, SiglipConfig, Siglip2Config, Siglip2VisionConfig, SiglipVisionConfig, SmolLM3Config, SmolVLMConfig, SmolVLMVisionConfig, Speech2TextConfig, SpeechT5Config, SplinterConfig, SqueezeBertConfig, StableLmConfig, Starcoder2Config, SwiftFormerConfig, SwinConfig, Swin2SRConfig, Swinv2Config, SwitchTransformersConfig, T5Config, T5GemmaConfig, TableTransformerConfig, TapasConfig, TextNetConfig, TimeSeriesTransformerConfig, TimesFmConfig, TimesformerConfig, TimmBackboneConfig, TimmWrapperConfig, TrajectoryTransformerConfig, TransfoXLConfig, TvltConfig, TvpConfig, UdopConfig, UMT5Config, UniSpeechConfig, UniSpeechSatConfig, UnivNetConfig, VanConfig, VaultGemmaConfig, VideoLlavaConfig, VideoMAEConfig, ViltConfig, VipLlavaConfig, VisionTextDualEncoderConfig, VisualBertConfig, ViTConfig, ViTHybridConfig, ViTMAEConfig, ViTMSNConfig, VitDetConfig, VitsConfig, VivitConfig, VJEPA2Config, VoxtralConfig, VoxtralEncoderConfig, Wav2Vec2Config, Wav2Vec2BertConfig, Wav2Vec2ConformerConfig, WavLMConfig, WhisperConfig, XCLIPConfig, XcodecConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, xLSTMConfig, XmodConfig, YolosConfig, YosoConfig, ZambaConfig, Zamba2Config.
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] Exception in worker VllmWorkerProcess while processing method load_model.
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] Traceback (most recent call last):
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/executor/multiproc_worker_utils.py", line 226, in _run_worker_process
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/utils/init.py", line 3007, in run_method
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] return func(*args, **kwargs)
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/worker/worker.py", line 211, in load_model
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] self.model_runner.load_model()
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/worker/model_runner.py", line 1083, in load_model
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] self.model = get_model(vllm_config=self.vllm_config)
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/model_executor/model_loader/init.py", line 118, in get_model
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] return loader.load_model(vllm_config=vllm_config,
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/model_executor/model_loader/base_loader.py", line 44, in load_model
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] model = initialize_model(vllm_config=vllm_config,
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] return model_class(vllm_config=vllm_config, prefix=prefix)
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/model_executor/models/transformers.py", line 737, in init
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] super().init(vllm_config=vllm_config, prefix=prefix)
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/compilation/decorators.py", line 183, in init
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/model_executor/models/transformers.py", line 661, in init
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] super().init(vllm_config=vllm_config, prefix=prefix)
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/Qwen3-Omni/vllm/model_executor/models/transformers.py", line 423, in init
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] self.model: PreTrainedModel = AutoModel.from_config(
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] File "/anaconda3/envs/vis3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 458, in from_config
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] raise ValueError(
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] ValueError: Unrecognized configuration class <class 'transformers.models.qwen3_omni_moe.configuration_qwen3_omni_moe.Qwen3OmniMoeConfig'> for this kind of AutoModel: AutoModel.
(VllmWorkerProcess pid=781008) ERROR 11-05 11:35:05 [multiproc_worker_utils.py:232] Model type should be one of Aimv2Config, Aimv2VisionConfig, AlbertConfig, AlignConfig, AltCLIPConfig, ApertusConfig, ArceeConfig, AriaConfig, AriaTextConfig, ASTConfig, AutoformerConfig, AyaVisionConfig, BambaConfig, BarkConfig, BartConfig, BeitConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BitConfig, BitNetConfig, BlenderbotConfig, BlenderbotSmallConfig, BlipConfig, Blip2Config, Blip2QFormerConfig, BloomConfig, BltConfig, BridgeTowerConfig, BrosConfig, CamembertConfig, CanineConfig, ChameleonConfig, ChineseCLIPConfig, ChineseCLIPVisionConfig, ClapConfig, CLIPConfig, CLIPTextConfig, CLIPVisionConfig, CLIPSegConfig, ClvpConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, Cohere2VisionConfig, ConditionalDetrConfig, ConvBertConfig, ConvNextConfig, ConvNextV2Config, CpmAntConfig, CsmConfig, CTRLConfig, CvtConfig, DFineConfig, DabDetrConfig, DacConfig, Data2VecAudioConfig, Data2VecTextConfig, Data2VecVisionConfig, DbrxConfig, DebertaConfig, DebertaV2Config, DecisionTransformerConfig, DeepseekV2Config, DeepseekV3Config, DeepseekVLConfig, DeepseekVLHybridConfig, DeformableDetrConfig, DeiTConfig, DepthProConfig, DetaConfig, DetrConfig, DiaConfig, DiffLlamaConfig, DinatConfig, Dinov2Config, Dinov2WithRegistersConfig, DINOv3ConvNextConfig, DINOv3ViTConfig, DistilBertConfig, DogeConfig, DonutSwinConfig, Dots1Config, DPRConfig, DPTConfig, EdgeTamConfig, EdgeTamVideoConfig, EdgeTamVisionConfig, EfficientFormerConfig, EfficientLoFTRConfig, EfficientNetConfig, ElectraConfig, Emu3Config, EncodecConfig, ErnieConfig, Ernie4_5Config, Ernie4_5_MoeConfig, ErnieMConfig, EsmConfig, EvollaConfig, Exaone4Config, FalconConfig, FalconH1Config, FalconMambaConfig, FastSpeech2ConformerConfig, FastSpeech2ConformerWithHifiGanConfig, FlaubertConfig, FlavaConfig, FlexOlmoConfig, Florence2Config, FNetConfig, FocalNetConfig, FSMTConfig, FunnelConfig, FuyuConfig, GemmaConfig, Gemma2Config, Gemma3Config, Gemma3TextConfig, Gemma3nConfig, Gemma3nAudioConfig, Gemma3nTextConfig, Gemma3nVisionConfig, GitConfig, GlmConfig, Glm4Config, Glm4MoeConfig, Glm4vConfig, Glm4vMoeConfig, Glm4vMoeTextConfig, Glm4vTextConfig, GLPNConfig, GotOcr2Config, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GptOssConfig, GPTJConfig, GPTSanJapaneseConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeHybridConfig, GraniteMoeSharedConfig, GraphormerConfig, GroundingDinoConfig, GroupViTConfig, HeliumConfig, HGNetV2Config, HieraConfig, HubertConfig, HunYuanDenseV1Config, HunYuanMoEV1Config, IBertConfig, IdeficsConfig, Idefics2Config, Idefics3Config, Idefics3VisionConfig, IJepaConfig, ImageGPTConfig, InformerConfig, InstructBlipConfig, InstructBlipVideoConfig, InternVLConfig, InternVLVisionConfig, JambaConfig, JanusConfig, JetMoeConfig, JukeboxConfig, Kosmos2Config, Kosmos2_5Config, KyutaiSpeechToTextConfig, LayoutLMConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LevitConfig, Lfm2Config, Lfm2VlConfig, LightGlueConfig, LiltConfig, LlamaConfig, Llama4Config, Llama4TextConfig, LlavaConfig, LlavaNextConfig, LlavaNextVideoConfig, LlavaOnevisionConfig, LongcatFlashConfig, LongformerConfig, LongT5Config, LukeConfig, LxmertConfig, M2M100Config, MambaConfig, Mamba2Config, MarianConfig, MarkupLMConfig, Mask2FormerConfig, MaskFormerConfig, MaskFormerSwinConfig, MBartConfig, MCTCTConfig, MegaConfig, MegatronBertConfig, MetaClip2Config, MgpstrConfig, MimiConfig, MiniMaxConfig, MinistralConfig, MistralConfig, Mistral3Config, MixtralConfig, MLCDVisionConfig, MllamaConfig, MMGroundingDinoConfig, MobileBertConfig, MobileNetV1Config, MobileNetV2Config, MobileViTConfig, MobileViTV2Config, ModernBertConfig, ModernBertDecoderConfig, MoonshineConfig, MoshiConfig, MPNetConfig, MptConfig, MraConfig, MT5Config, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NatConfig, NemotronConfig, NezhaConfig, NllbMoeConfig, NystromformerConfig, OlmoConfig, Olmo2Config, Olmo3Config, OlmoeConfig, OmDetTurboConfig, OneFormerConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, Ovis2Config, Owlv2Config, OwlViTConfig, PaliGemmaConfig, ParakeetCTCConfig, ParakeetEncoderConfig, PatchTSMixerConfig, PatchTSTConfig, PegasusConfig, PegasusXConfig, PerceiverConfig, TimmWrapperConfig, PerceptionLMConfig, PersimmonConfig, PhiConfig, Phi3Config, Phi4MultimodalConfig, PhimoeConfig, PixtralVisionConfig, PLBartConfig, PoolFormerConfig, ProphetNetConfig, PvtConfig, PvtV2Config, QDQBertConfig, Qwen2Config, Qwen2_5_VLConfig, Qwen2_5_VLTextConfig, Qwen2AudioEncoderConfig, Qwen2MoeConfig, Qwen2VLConfig, Qwen2VLTextConfig, Qwen3Config, Qwen3MoeConfig, Qwen3NextConfig, Qwen3VLConfig, Qwen3VLMoeConfig, Qwen3VLMoeTextConfig, Qwen3VLTextConfig, RecurrentGemmaConfig, ReformerConfig, RegNetConfig, RemBertConfig, ResNetConfig, RetriBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RTDetrConfig, RTDetrV2Config, RwkvConfig, SamConfig, Sam2Config, Sam2HieraDetConfig, Sam2VideoConfig, Sam2VisionConfig, SamHQConfig, SamHQVisionConfig, SamVisionConfig, SeamlessM4TConfig, SeamlessM4Tv2Config, SeedOssConfig, SegformerConfig, SegGptConfig, SEWConfig, SEWDConfig, SiglipConfig, Siglip2Config, Siglip2VisionConfig, SiglipVisionConfig, SmolLM3Config, SmolVLMConfig, SmolVLMVisionConfig, Speech2TextConfig, SpeechT5Config, SplinterConfig, SqueezeBertConfig, StableLmConfig, Starcoder2Config, SwiftFormerConfig, SwinConfig, Swin2SRConfig, Swinv2Config, SwitchTransformersConfig, T5Config, T5GemmaConfig, TableTransformerConfig, TapasConfig, TextNetConfig, TimeSeriesTransformerConfig, TimesFmConfig, TimesformerConfig, TimmBackboneConfig, TimmWrapperConfig, TrajectoryTransformerConfig, TransfoXLConfig, TvltConfig, TvpConfig, UdopConfig, UMT5Config, UniSpeechConfig, UniSpeechSatConfig, UnivNetConfig, VanConfig, VaultGemmaConfig, VideoLlavaConfig, VideoMAEConfig, ViltConfig, VipLlavaConfig, VisionTextDualEncoderConfig, VisualBertConfig, ViTConfig, ViTHybridConfig, ViTMAEConfig, ViTMSNConfig, VitDetConfig, VitsConfig, VivitConfig, VJEPA2Config, VoxtralConfig, VoxtralEncoderConfig, Wav2Vec2Config, Wav2Vec2BertConfig, Wav2Vec2ConformerConfig, WavLMConfig, WhisperConfig, XCLIPConfig, XcodecConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, xLSTMConfig, XmodConfig, YolosConfig, YosoConfig, ZambaConfig, Zamba2Config.
[rank0]: Traceback (most recent call last):
[rank0]: File "/Qwen3-Omni/web_demo.py", line 394, in
[rank0]: model, processor = _load_model_processor(args)
[rank0]: File "/Qwen3-Omni/web_demo.py", line 38, in _load_model_processor
[rank0]: model = LLM(
[rank0]: File "/Qwen3-Omni/vllm/entrypoints/llm.py", line 285, in init
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: File "/Qwen3-Omni/vllm/engine/llm_engine.py", line 490, in from_engine_args
[rank0]: return engine_cls.from_vllm_config(
[rank0]: File "/Qwen3-Omni/vllm/engine/llm_engine.py", line 466, in from_vllm_config
[rank0]: return cls(
[rank0]: File "/Qwen3-Omni/vllm/engine/llm_engine.py", line 257, in init
[rank0]: self.model_executor = executor_class(vllm_config=vllm_config)
[rank0]: File "/Qwen3-Omni/vllm/executor/executor_base.py", line 264, in init
[rank0]: super().init(*args, **kwargs)
[rank0]: File "/Qwen3-Omni/vllm/executor/executor_base.py", line 54, in init
[rank0]: self._init_executor()
[rank0]: File "/Qwen3-Omni/vllm/executor/mp_distributed_executor.py", line 126, in _init_executor
[rank0]: self._run_workers("load_model",
[rank0]: File "/Qwen3-Omni/vllm/executor/mp_distributed_executor.py", line 186, in _run_workers
[rank0]: driver_worker_output = run_method(self.driver_worker, sent_method,
[rank0]: File "/Qwen3-Omni/vllm/utils/init.py", line 3007, in run_method
[rank0]: return func(*args, **kwargs)
[rank0]: File "/Qwen3-Omni/vllm/worker/worker.py", line 211, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/Qwen3-Omni/vllm/worker/model_runner.py", line 1083, in load_model
[rank0]: self.model = get_model(vllm_config=self.vllm_config)
[rank0]: File "/Qwen3-Omni/vllm/model_executor/model_loader/init.py", line 118, in get_model
[rank0]: return loader.load_model(vllm_config=vllm_config,
[rank0]: File "/Qwen3-Omni/vllm/model_executor/model_loader/base_loader.py", line 44, in load_model
[rank0]: model = initialize_model(vllm_config=vllm_config,
[rank0]: File "/Qwen3-Omni/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
[rank0]: return model_class(vllm_config=vllm_config, prefix=prefix)
[rank0]: File "/Qwen3-Omni/vllm/model_executor/models/transformers.py", line 737, in init
[rank0]: super().init(vllm_config=vllm_config, prefix=prefix)
[rank0]: File "/Qwen3-Omni/vllm/compilation/decorators.py", line 183, in init
[rank0]: old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
[rank0]: File "/Qwen3-Omni/vllm/model_executor/models/transformers.py", line 661, in init
[rank0]: super().init(vllm_config=vllm_config, prefix=prefix)
[rank0]: File "/Qwen3-Omni/vllm/model_executor/models/transformers.py", line 423, in init
[rank0]: self.model: PreTrainedModel = AutoModel.from_config(
[rank0]: File "/anaconda3/envs/vis3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 458, in from_config
[rank0]: raise ValueError(
[rank0]: ValueError: Unrecognized configuration class <class 'transformers.models.qwen3_omni_moe.configuration_qwen3_omni_moe.Qwen3OmniMoeConfig'> for this kind of AutoModel: AutoModel.
[rank0]: Model type should be one of Aimv2Config, Aimv2VisionConfig, AlbertConfig, AlignConfig, AltCLIPConfig, ApertusConfig, ArceeConfig, AriaConfig, AriaTextConfig, ASTConfig, AutoformerConfig, AyaVisionConfig, BambaConfig, BarkConfig, BartConfig, BeitConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BitConfig, BitNetConfig, BlenderbotConfig, BlenderbotSmallConfig, BlipConfig, Blip2Config, Blip2QFormerConfig, BloomConfig, BltConfig, BridgeTowerConfig, BrosConfig, CamembertConfig, CanineConfig, ChameleonConfig, ChineseCLIPConfig, ChineseCLIPVisionConfig, ClapConfig, CLIPConfig, CLIPTextConfig, CLIPVisionConfig, CLIPSegConfig, ClvpConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, Cohere2VisionConfig, ConditionalDetrConfig, ConvBertConfig, ConvNextConfig, ConvNextV2Config, CpmAntConfig, CsmConfig, CTRLConfig, CvtConfig, DFineConfig, DabDetrConfig, DacConfig, Data2VecAudioConfig, Data2VecTextConfig, Data2VecVisionConfig, DbrxConfig, DebertaConfig, DebertaV2Config, DecisionTransformerConfig, DeepseekV2Config, DeepseekV3Config, DeepseekVLConfig, DeepseekVLHybridConfig, DeformableDetrConfig, DeiTConfig, DepthProConfig, DetaConfig, DetrConfig, DiaConfig, DiffLlamaConfig, DinatConfig, Dinov2Config, Dinov2WithRegistersConfig, DINOv3ConvNextConfig, DINOv3ViTConfig, DistilBertConfig, DogeConfig, DonutSwinConfig, Dots1Config, DPRConfig, DPTConfig, EdgeTamConfig, EdgeTamVideoConfig, EdgeTamVisionConfig, EfficientFormerConfig, EfficientLoFTRConfig, EfficientNetConfig, ElectraConfig, Emu3Config, EncodecConfig, ErnieConfig, Ernie4_5Config, Ernie4_5_MoeConfig, ErnieMConfig, EsmConfig, EvollaConfig, Exaone4Config, FalconConfig, FalconH1Config, FalconMambaConfig, FastSpeech2ConformerConfig, FastSpeech2ConformerWithHifiGanConfig, FlaubertConfig, FlavaConfig, FlexOlmoConfig, Florence2Config, FNetConfig, FocalNetConfig, FSMTConfig, FunnelConfig, FuyuConfig, GemmaConfig, Gemma2Config, Gemma3Config, Gemma3TextConfig, Gemma3nConfig, Gemma3nAudioConfig, Gemma3nTextConfig, Gemma3nVisionConfig, GitConfig, GlmConfig, Glm4Config, Glm4MoeConfig, Glm4vConfig, Glm4vMoeConfig, Glm4vMoeTextConfig, Glm4vTextConfig, GLPNConfig, GotOcr2Config, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GptOssConfig, GPTJConfig, GPTSanJapaneseConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeHybridConfig, GraniteMoeSharedConfig, GraphormerConfig, GroundingDinoConfig, GroupViTConfig, HeliumConfig, HGNetV2Config, HieraConfig, HubertConfig, HunYuanDenseV1Config, HunYuanMoEV1Config, IBertConfig, IdeficsConfig, Idefics2Config, Idefics3Config, Idefics3VisionConfig, IJepaConfig, ImageGPTConfig, InformerConfig, InstructBlipConfig, InstructBlipVideoConfig, InternVLConfig, InternVLVisionConfig, JambaConfig, JanusConfig, JetMoeConfig, JukeboxConfig, Kosmos2Config, Kosmos2_5Config, KyutaiSpeechToTextConfig, LayoutLMConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LevitConfig, Lfm2Config, Lfm2VlConfig, LightGlueConfig, LiltConfig, LlamaConfig, Llama4Config, Llama4TextConfig, LlavaConfig, LlavaNextConfig, LlavaNextVideoConfig, LlavaOnevisionConfig, LongcatFlashConfig, LongformerConfig, LongT5Config, LukeConfig, LxmertConfig, M2M100Config, MambaConfig, Mamba2Config, MarianConfig, MarkupLMConfig, Mask2FormerConfig, MaskFormerConfig, MaskFormerSwinConfig, MBartConfig, MCTCTConfig, MegaConfig, MegatronBertConfig, MetaClip2Config, MgpstrConfig, MimiConfig, MiniMaxConfig, MinistralConfig, MistralConfig, Mistral3Config, MixtralConfig, MLCDVisionConfig, MllamaConfig, MMGroundingDinoConfig, MobileBertConfig, MobileNetV1Config, MobileNetV2Config, MobileViTConfig, MobileViTV2Config, ModernBertConfig, ModernBertDecoderConfig, MoonshineConfig, MoshiConfig, MPNetConfig, MptConfig, MraConfig, MT5Config, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NatConfig, NemotronConfig, NezhaConfig, NllbMoeConfig, NystromformerConfig, OlmoConfig, Olmo2Config, Olmo3Config, OlmoeConfig, OmDetTurboConfig, OneFormerConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, Ovis2Config, Owlv2Config, OwlViTConfig, PaliGemmaConfig, ParakeetCTCConfig, ParakeetEncoderConfig, PatchTSMixerConfig, PatchTSTConfig, PegasusConfig, PegasusXConfig, PerceiverConfig, TimmWrapperConfig, PerceptionLMConfig, PersimmonConfig, PhiConfig, Phi3Config, Phi4MultimodalConfig, PhimoeConfig, PixtralVisionConfig, PLBartConfig, PoolFormerConfig, ProphetNetConfig, PvtConfig, PvtV2Config, QDQBertConfig, Qwen2Config, Qwen2_5_VLConfig, Qwen2_5_VLTextConfig, Qwen2AudioEncoderConfig, Qwen2MoeConfig, Qwen2VLConfig, Qwen2VLTextConfig, Qwen3Config, Qwen3MoeConfig, Qwen3NextConfig, Qwen3VLConfig, Qwen3VLMoeConfig, Qwen3VLMoeTextConfig, Qwen3VLTextConfig, RecurrentGemmaConfig, ReformerConfig, RegNetConfig, RemBertConfig, ResNetConfig, RetriBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RTDetrConfig, RTDetrV2Config, RwkvConfig, SamConfig, Sam2Config, Sam2HieraDetConfig, Sam2VideoConfig, Sam2VisionConfig, SamHQConfig, SamHQVisionConfig, SamVisionConfig, SeamlessM4TConfig, SeamlessM4Tv2Config, SeedOssConfig, SegformerConfig, SegGptConfig, SEWConfig, SEWDConfig, SiglipConfig, Siglip2Config, Siglip2VisionConfig, SiglipVisionConfig, SmolLM3Config, SmolVLMConfig, SmolVLMVisionConfig, Speech2TextConfig, SpeechT5Config, SplinterConfig, SqueezeBertConfig, StableLmConfig, Starcoder2Config, SwiftFormerConfig, SwinConfig, Swin2SRConfig, Swinv2Config, SwitchTransformersConfig, T5Config, T5GemmaConfig, TableTransformerConfig, TapasConfig, TextNetConfig, TimeSeriesTransformerConfig, TimesFmConfig, TimesformerConfig, TimmBackboneConfig, TimmWrapperConfig, TrajectoryTransformerConfig, TransfoXLConfig, TvltConfig, TvpConfig, UdopConfig, UMT5Config, UniSpeechConfig, UniSpeechSatConfig, UnivNetConfig, VanConfig, VaultGemmaConfig, VideoLlavaConfig, VideoMAEConfig, ViltConfig, VipLlavaConfig, VisionTextDualEncoderConfig, VisualBertConfig, ViTConfig, ViTHybridConfig, ViTMAEConfig, ViTMSNConfig, VitDetConfig, VitsConfig, VivitConfig, VJEPA2Config, VoxtralConfig, VoxtralEncoderConfig, Wav2Vec2Config, Wav2Vec2BertConfig, Wav2Vec2ConformerConfig, WavLMConfig, WhisperConfig, XCLIPConfig, XcodecConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, xLSTMConfig, XmodConfig, YolosConfig, YosoConfig, ZambaConfig, Zamba2Config.
[rank0]:[W1105 11:35:06.754006060 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
/anaconda3/envs/vis3/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

`

Who can help?

I have cloned this repo Qwen3-Omni

I have downloaded the Qwen3-Omni locally from huggingface
$python web_demo.py

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Download Qwen3-Omni
  2. Download the models Qwen3-Omni-Instruct or Qwen3-Omni-Thinking from huggingface
  3. Install the requirements from the Qwen3-Omni git instruction with mentioned command lines like below
git clone -b qwen3_omni https://github.com/wangxiongts/vllm.git
cd vllm
pip install -r requirements/build.txt
pip install -r requirements/cuda.txt
export VLLM_PRECOMPILED_WHEEL_LOCATION=https://wheels.vllm.ai/a5dd03c1ebc5e4f56f3c9d3dc0436e9c582c978f/vllm-0.9.2-cp38-abi3-manylinux1_x86_64.whl
VLLM_USE_PRECOMPILED=1 pip install -e . -v --no-build-isolation
# If you meet an "Undefined symbol" error while using VLLM_USE_PRECOMPILED=1, please use "pip install -e . -v" to build from source.
# Install the Transformers
pip install git+https://github.com/huggingface/transformers
pip install accelerate
pip install qwen-omni-utils -U
pip install -U flash-attn --no-build-isolation

pip install gradio==5.44.1 gradio_client==1.12.1 soundfile==0.13.1

Than in the envs run the command below

$python web_demo.py

Expected behavior

As normally it should load the model without issues and run the server and run the inference without problems with the loaded video for analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions