[WIP] Fix Qwen Edit Plus modular for multi-image input #12601

sayakpaul · 2025-11-06T13:56:10Z

What does this PR do?

Don't review yet.

sayakpaul · 2025-11-06T13:57:13Z

src/diffusers/modular_pipelines/qwenimage/before_denoise.py

+        vae_scale_factor = components.vae_scale_factor
+        block_state.img_shapes = [
+            [
+                (1, block_state.height // vae_scale_factor // 2, block_state.width // vae_scale_factor // 2),
+                *[
+                    (1, vae_height // vae_scale_factor // 2, vae_width // vae_scale_factor // 2)
+                    for vae_width, vae_height in block_state.vae_image_sizes
+                ],
+            ]


Main difference from the existing RoPE block for Edit.

sayakpaul · 2025-11-06T13:59:19Z

src/diffusers/modular_pipelines/qwenimage/encoders.py

+            vae_image_sizes = []
+            for img in block_state.vae_image:
+                width, height = img.size
+                vae_width, vae_height, _ = calculate_dimensions(self.vae_image_size, width / height)
+                vae_image_sizes.append((vae_width, vae_height))
+                processed_images.append(
+                    components.image_processor.preprocess(image=img, height=vae_height, width=vae_width)
+                )
+            block_state.processed_image = torch.stack(processed_images, dim=0).squeeze(1)


Each input image can have a different resolution and that is why separate preprocess is needed (also following the original implementation). This came into fruition because we started testing with multiple input images.

sayakpaul · 2025-11-06T13:59:49Z

src/diffusers/modular_pipelines/qwenimage/inputs.py

+            if self.reshape_to_seq_dim:
+                channels = image_latent_tensor.shape[-1]
+                image_latent_tensor = image_latent_tensor.reshape(1, -1, channels)


Qwen Image Edit Plus specific.

HuggingFaceDocBuilderDev · 2025-11-06T14:04:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul added 2 commits November 6, 2025 17:53

try to fix qwen edit plus multi images (modular)

c6b1283

up

e13e3e4

sayakpaul commented Nov 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Fix Qwen Edit Plus modular for multi-image input #12601

[WIP] Fix Qwen Edit Plus modular for multi-image input #12601

sayakpaul commented Nov 6, 2025

Uh oh!

sayakpaul Nov 6, 2025

Uh oh!

sayakpaul Nov 6, 2025

Uh oh!

sayakpaul Nov 6, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[WIP] Fix Qwen Edit Plus modular for multi-image input #12601

Are you sure you want to change the base?

[WIP] Fix Qwen Edit Plus modular for multi-image input #12601

Conversation

sayakpaul commented Nov 6, 2025

What does this PR do?

Uh oh!

sayakpaul Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants