Modify image embedding return for Llava compatibility (#147)

larryliu0820 · web-flow · commit 8335b491fa77 · 2025-10-02T15:24:21.000-07:00
* Modify image embedding return for Llava compatibility For `Gemma3` we don't need to unsqueeze the image embedding. For `Llava` we got a list of 2D tensors and here I'm assuming it only contains 1 tensor. https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/modeling_llava.py#L214 * Address comments
diff --git a/optimum/exporters/executorch/integrations.py b/optimum/exporters/executorch/integrations.py
@@ -77,7 +77,9 @@ def forward(
         input_features: torch.FloatTensor,
     ):
         image_embeds = self.model.get_image_features(input_features)
-        return image_embeds.unsqueeze(0)
+        if isinstance(image_embeds, list):
+            image_embeds = torch.stack(image_embeds)
+        return image_embeds
 
 
 class AudioExportableModule(torch.nn.Module):