Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions openhands-tools/openhands/tools/file_editor/definition.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ def _has_meaningful_diff(self) -> bool:
TOOL_DESCRIPTION = """Custom editing tool for viewing, creating and editing files in plain-text format
* State is persistent across command calls and discussions with the user
* If `path` is a text file, `view` displays the result of applying `cat -n`. If `path` is a directory, `view` lists non-hidden files and directories up to 2 levels deep
* If `path` is an image file (.png, .jpg, .jpeg, .gif, .webp, .bmp), `view` displays the image content
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands creates an issue in this repo: we should be able to decide if we want to include this line of instruction based on whether the LLM supports image inputs -- when using OSS models that don't support LLM inputs, we should not include this line

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

I've successfully addressed the PR comment from @xingyaoww about making the image viewing instruction conditional based on LLM vision support.

Changes Made

✅ Request Checklist:

  • Image viewing instruction is now conditionally included based on LLM vision support
  • OSS models without vision support won't see the misleading instruction
  • Vision-capable models will still get the image viewing instruction
  • Tests added to verify both scenarios
  • All pre-commit checks pass
  • Changes committed and pushed to the img branch

📝 Implementation Details:

  1. Modified openhands-tools/openhands/tools/file_editor/definition.py:

    • Removed hardcoded image viewing line from TOOL_DESCRIPTION constant
    • Added conditional logic in FileEditorTool.create() that checks conv_state.agent.llm.vision_is_active()
    • When vision is supported, the instruction is dynamically inserted at the correct position
    • When vision is not supported, the instruction is omitted entirely
  2. Added tests in tests/tools/file_editor/test_file_editor_tool.py:

    • test_file_editor_tool_image_viewing_line_with_vision_enabled(): Verifies instruction appears with gpt-4o-mini
    • test_file_editor_tool_image_viewing_line_with_vision_disabled(): Verifies instruction is absent with gpt-3.5-turbo

✅ Verification:

  • All existing tests pass (10/10 in test_file_editor_tool.py)
  • New tests pass and validate the conditional behavior
  • Pre-commit hooks pass (Ruff format, Ruff lint, pycodestyle, pyright)
  • Changes pushed to remote branch img

The FileEditor tool now intelligently adapts its description based on the LLM's capabilities, preventing confusion when using models without vision support.

View full conversation

* The `create` command cannot be used if the specified `path` already exists as a file
* If a `command` generates a long output, it will be truncated and marked with `<response clipped>`
* The `undo_edit` command will revert the last edit made to the file at `path`
Expand Down
46 changes: 44 additions & 2 deletions openhands-tools/openhands/tools/file_editor/editor.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import base64
import os
import re
import shutil
Expand All @@ -7,6 +8,7 @@

from binaryornot.check import is_binary

from openhands.sdk import ImageContent, TextContent
from openhands.sdk.logger import get_logger
from openhands.sdk.utils.truncate import maybe_truncate
from openhands.tools.file_editor.definition import (
Expand Down Expand Up @@ -36,6 +38,9 @@

logger = get_logger(__name__)

# Supported image extensions for viewing as base64-encoded content
IMAGE_EXTENSIONS = {".png", ".jpg", ".jpeg", ".gif", ".webp", ".bmp"}


class FileEditor:
"""
Expand Down Expand Up @@ -327,6 +332,42 @@ def view(
prev_exist=True,
)

# Check if the file is an image
file_extension = path.suffix.lower()
if file_extension in IMAGE_EXTENSIONS:
# Read image file as base64
try:
with open(path, "rb") as f:
image_bytes = f.read()
image_base64 = base64.b64encode(image_bytes).decode("utf-8")

mime_type = "image/png" # default
if image_base64.startswith("/9j/"):
mime_type = "image/jpeg"
elif image_base64.startswith("iVBORw0KGgo"):
mime_type = "image/png"
elif image_base64.startswith("R0lGODlh"):
mime_type = "image/gif"
elif image_base64.startswith("UklGR"):
mime_type = "image/webp"
elif image_base64.startswith("Qk"):
mime_type = "image/bmp"
Copy link
Collaborator

@simonrosenberg simonrosenberg Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this!

I am no expert but all those if/elifs seem a bit hacky here. Would it make sense to use a library instead?

import mimetypes

mime_type, _ = mimetypes.guess_type(path)
mime_type = mime_type or "application/octet-stream"

or

import imghdr

image_type = imghdr.what(path)
mime_type = f"image/{image_type}" if image_type else "application/octet-stream"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use a library instead?

Yes.


Similar snippet in codebase:

if self.screenshot_data:
mime_type = "image/png"
if self.screenshot_data.startswith("/9j/"):
mime_type = "image/jpeg"
elif self.screenshot_data.startswith("iVBORw0KGgo"):
mime_type = "image/png"
elif self.screenshot_data.startswith("R0lGODlh"):
mime_type = "image/gif"
elif self.screenshot_data.startswith("UklGR"):
mime_type = "image/webp"

Copy link
Collaborator

@simonrosenberg simonrosenberg Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands /codereview-roasted read the comments above
the following block of code is 1. duplicated (copy-pasted from software-agent-sdk/openhands-tools/openhands/tools/browser_use/definition.py#L50-L59) 2. a bit hacky.
What are you suggestions to improve this?

 if self.screenshot_data: 
     mime_type = "image/png" 
     if self.screenshot_data.startswith("/9j/"): 
         mime_type = "image/jpeg" 
     elif self.screenshot_data.startswith("iVBORw0KGgo"): 
         mime_type = "image/png" 
     elif self.screenshot_data.startswith("R0lGODlh"): 
         mime_type = "image/gif" 
     elif self.screenshot_data.startswith("UklGR"): 
         mime_type = "image/webp"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! simonrosenberg can track my progress at all-hands.dev

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

I've analyzed the duplicated and hacky MIME type detection code in PR #1016. Here's my assessment:

Problems Identified:

  1. Code Duplication: The exact same MIME type detection logic (lines 344-354 in file_editor/editor.py) is copy-pasted from browser_use/definition.py (lines 50-59)

  2. Wrong Approach: The code detects MIME types by pattern-matching base64-encoded strings rather than checking actual file data. This is brittle because you're inspecting an encoding artifact, not the actual file format.

  3. Ignoring Available Data: In file_editor/editor.py, you have the file path but still read the entire file, encode it to base64, and pattern-match strings - when Python's mimetypes.guess_type(path) would do this correctly in one line.

Recommended Solutions:

For file_editor.py (SIMPLEST FIX):
Replace lines 344-354 with:

import mimetypes

mime_type, _ = mimetypes.guess_type(path)
if not mime_type or not mime_type.startswith('image/'):
    mime_type = 'application/octet-stream'

This is simpler (3 lines vs 10), faster (no file inspection needed), more correct (uses standard library), and more maintainable (handles all formats automatically).

For long-term maintainability:
Create a shared utility function in openhands-tools/openhands/tools/utils/mime_type.py that:

  • Uses mimetypes.guess_type() when a file path is available
  • Falls back to checking actual magic numbers (not base64 prefixes) for raw data
  • Can be used by both file_editor and browser_use to eliminate duplication

Verdict:

Needs rework - Violates DRY principle and uses incorrect approach for type detection

The current approach works but is unnecessarily complex and will become a maintenance burden. Using Python's standard library is the pragmatic, simple solution that eliminates special cases.

View full conversation

Copy link
Collaborator

@simonrosenberg simonrosenberg Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can implement the recommended solution @SmartManoj ? I dont think we need the long-term maintainability solution yet

Copy link
Contributor Author

@SmartManoj SmartManoj Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mime_type = 'application/octet-stream'

@simonrosenberg, what do @OpenHands think about quote-style?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed " is preferred!

output_msg = (
f"Image file {path} read successfully. Displaying image content."
)
image_url = f"data:{mime_type};base64,{image_base64}"
return FileEditorObservation(
command="view",
content=[
TextContent(text=output_msg),
ImageContent(image_urls=[image_url]),
],
path=str(path),
prev_exist=True,
)
except Exception as e:
raise ToolError(f"Failed to read image file {path}: {e}") from None

# Validate file and count lines
self.validate_file(path)
num_lines = self._count_lines(path)
Expand Down Expand Up @@ -609,8 +650,9 @@ def validate_file(self, path: Path) -> None:
),
)

# Check file type
if is_binary(str(path)):
# Check file type - allow image files
file_extension = path.suffix.lower()
if is_binary(str(path)) and file_extension not in IMAGE_EXTENSIONS:
raise FileValidationError(
path=str(path),
reason=(
Expand Down
7 changes: 2 additions & 5 deletions tests/tools/file_editor/test_file_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,5 @@ def test_validate_image_file(tmp_path):

assert is_binary(str(image_file))

# Images are not supported and should be detected as binary
with pytest.raises(FileValidationError) as exc_info:
editor.validate_file(image_file)

assert "file appears to be binary" in str(exc_info.value).lower()
# Images are not supported, so no exception should be raised
editor.validate_file(image_file)
Loading