-
Notifications
You must be signed in to change notification settings - Fork 61
Add image file viewing support to FileEditor #1016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
bc9e673
68c990a
33f0a3f
9735516
2f0e725
a883f91
d1f4666
384522b
5fa71b8
1b38b0e
d7e27e8
eb9373b
81377cb
a1801b8
3f908b9
f783fcf
e7666de
262bd60
dac9357
fe9257d
07ab4b2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,3 +1,4 @@ | ||||||||||||||||||||||
| import base64 | ||||||||||||||||||||||
| import os | ||||||||||||||||||||||
| import re | ||||||||||||||||||||||
| import shutil | ||||||||||||||||||||||
|
|
@@ -7,6 +8,7 @@ | |||||||||||||||||||||
|
|
||||||||||||||||||||||
| from binaryornot.check import is_binary | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| from openhands.sdk import ImageContent, TextContent | ||||||||||||||||||||||
| from openhands.sdk.logger import get_logger | ||||||||||||||||||||||
| from openhands.sdk.utils.truncate import maybe_truncate | ||||||||||||||||||||||
| from openhands.tools.file_editor.definition import ( | ||||||||||||||||||||||
|
|
@@ -36,6 +38,9 @@ | |||||||||||||||||||||
|
|
||||||||||||||||||||||
| logger = get_logger(__name__) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| # Supported image extensions for viewing as base64-encoded content | ||||||||||||||||||||||
| IMAGE_EXTENSIONS = {".png", ".jpg", ".jpeg", ".gif", ".webp", ".bmp"} | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| class FileEditor: | ||||||||||||||||||||||
| """ | ||||||||||||||||||||||
|
|
@@ -327,6 +332,42 @@ def view( | |||||||||||||||||||||
| prev_exist=True, | ||||||||||||||||||||||
| ) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| # Check if the file is an image | ||||||||||||||||||||||
| file_extension = path.suffix.lower() | ||||||||||||||||||||||
| if file_extension in IMAGE_EXTENSIONS: | ||||||||||||||||||||||
| # Read image file as base64 | ||||||||||||||||||||||
| try: | ||||||||||||||||||||||
| with open(path, "rb") as f: | ||||||||||||||||||||||
| image_bytes = f.read() | ||||||||||||||||||||||
| image_base64 = base64.b64encode(image_bytes).decode("utf-8") | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| mime_type = "image/png" # default | ||||||||||||||||||||||
| if image_base64.startswith("/9j/"): | ||||||||||||||||||||||
| mime_type = "image/jpeg" | ||||||||||||||||||||||
| elif image_base64.startswith("iVBORw0KGgo"): | ||||||||||||||||||||||
| mime_type = "image/png" | ||||||||||||||||||||||
| elif image_base64.startswith("R0lGODlh"): | ||||||||||||||||||||||
| mime_type = "image/gif" | ||||||||||||||||||||||
| elif image_base64.startswith("UklGR"): | ||||||||||||||||||||||
| mime_type = "image/webp" | ||||||||||||||||||||||
| elif image_base64.startswith("Qk"): | ||||||||||||||||||||||
| mime_type = "image/bmp" | ||||||||||||||||||||||
|
||||||||||||||||||||||
| if self.screenshot_data: | |
| mime_type = "image/png" | |
| if self.screenshot_data.startswith("/9j/"): | |
| mime_type = "image/jpeg" | |
| elif self.screenshot_data.startswith("iVBORw0KGgo"): | |
| mime_type = "image/png" | |
| elif self.screenshot_data.startswith("R0lGODlh"): | |
| mime_type = "image/gif" | |
| elif self.screenshot_data.startswith("UklGR"): | |
| mime_type = "image/webp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OpenHands /codereview-roasted read the comments above
the following block of code is 1. duplicated (copy-pasted from software-agent-sdk/openhands-tools/openhands/tools/browser_use/definition.py#L50-L59) 2. a bit hacky.
What are you suggestions to improve this?
if self.screenshot_data:
mime_type = "image/png"
if self.screenshot_data.startswith("/9j/"):
mime_type = "image/jpeg"
elif self.screenshot_data.startswith("iVBORw0KGgo"):
mime_type = "image/png"
elif self.screenshot_data.startswith("R0lGODlh"):
mime_type = "image/gif"
elif self.screenshot_data.startswith("UklGR"):
mime_type = "image/webp"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! simonrosenberg can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review Summary
I've analyzed the duplicated and hacky MIME type detection code in PR #1016. Here's my assessment:
Problems Identified:
-
Code Duplication: The exact same MIME type detection logic (lines 344-354 in
file_editor/editor.py) is copy-pasted frombrowser_use/definition.py(lines 50-59) -
Wrong Approach: The code detects MIME types by pattern-matching base64-encoded strings rather than checking actual file data. This is brittle because you're inspecting an encoding artifact, not the actual file format.
-
Ignoring Available Data: In
file_editor/editor.py, you have the file path but still read the entire file, encode it to base64, and pattern-match strings - when Python'smimetypes.guess_type(path)would do this correctly in one line.
Recommended Solutions:
For file_editor.py (SIMPLEST FIX):
Replace lines 344-354 with:
import mimetypes
mime_type, _ = mimetypes.guess_type(path)
if not mime_type or not mime_type.startswith('image/'):
mime_type = 'application/octet-stream'This is simpler (3 lines vs 10), faster (no file inspection needed), more correct (uses standard library), and more maintainable (handles all formats automatically).
For long-term maintainability:
Create a shared utility function in openhands-tools/openhands/tools/utils/mime_type.py that:
- Uses
mimetypes.guess_type()when a file path is available - Falls back to checking actual magic numbers (not base64 prefixes) for raw data
- Can be used by both
file_editorandbrowser_useto eliminate duplication
Verdict:
❌ Needs rework - Violates DRY principle and uses incorrect approach for type detection
The current approach works but is unnecessarily complex and will become a maintenance burden. Using Python's standard library is the pragmatic, simple solution that eliminates special cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can implement the recommended solution @SmartManoj ? I dont think we need the long-term maintainability solution yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mime_type = 'application/octet-stream'
@simonrosenberg, what do @OpenHands think about quote-style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed " is preferred!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OpenHands creates an issue in this repo: we should be able to decide if we want to include this line of instruction based on whether the LLM supports image inputs -- when using OSS models that don't support LLM inputs, we should not include this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
I've successfully addressed the PR comment from @xingyaoww about making the image viewing instruction conditional based on LLM vision support.
Changes Made
✅ Request Checklist:
imgbranch📝 Implementation Details:
Modified
openhands-tools/openhands/tools/file_editor/definition.py:TOOL_DESCRIPTIONconstantFileEditorTool.create()that checksconv_state.agent.llm.vision_is_active()Added tests in
tests/tools/file_editor/test_file_editor_tool.py:test_file_editor_tool_image_viewing_line_with_vision_enabled(): Verifies instruction appears with gpt-4o-minitest_file_editor_tool_image_viewing_line_with_vision_disabled(): Verifies instruction is absent with gpt-3.5-turbo✅ Verification:
imgThe FileEditor tool now intelligently adapts its description based on the LLM's capabilities, preventing confusion when using models without vision support.
View full conversation