-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[WIP] Fix Qwen Edit Plus modular for multi-image input #12601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| vae_scale_factor = components.vae_scale_factor | ||
| block_state.img_shapes = [ | ||
| [ | ||
| (1, block_state.height // vae_scale_factor // 2, block_state.width // vae_scale_factor // 2), | ||
| *[ | ||
| (1, vae_height // vae_scale_factor // 2, vae_width // vae_scale_factor // 2) | ||
| for vae_width, vae_height in block_state.vae_image_sizes | ||
| ], | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main difference from the existing RoPE block for Edit.
| vae_image_sizes = [] | ||
| for img in block_state.vae_image: | ||
| width, height = img.size | ||
| vae_width, vae_height, _ = calculate_dimensions(self.vae_image_size, width / height) | ||
| vae_image_sizes.append((vae_width, vae_height)) | ||
| processed_images.append( | ||
| components.image_processor.preprocess(image=img, height=vae_height, width=vae_width) | ||
| ) | ||
| block_state.processed_image = torch.stack(processed_images, dim=0).squeeze(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each input image can have a different resolution and that is why separate preprocess is needed (also following the original implementation). This came into fruition because we started testing with multiple input images.
| if self.reshape_to_seq_dim: | ||
| channels = image_latent_tensor.shape[-1] | ||
| image_latent_tensor = image_latent_tensor.reshape(1, -1, channels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Qwen Image Edit Plus specific.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
Don't review yet.