-
Notifications
You must be signed in to change notification settings - Fork 56
Refactor: Always include risk fields #1052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: Always include risk fields #1052
Conversation
- Modified tool schema generation to include security_risk field when add_security_risk_prediction=True for all tool types (including read-only tools) - Updated LLM security analyzer validation to always require security_risk field when using LLMSecurityAnalyzer - Added comprehensive test suite for security_risk validation behavior - Fixed existing tests to reflect new behavior where security_risk is included for read-only tools when prediction is enabled - Updated docstrings to clarify the new behavior Co-authored-by: openhands <openhands@all-hands.dev>
Coverage Report •
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
…ation - Created new SecurityAnalyzerConfigurationEvent class extending Event - Added event type to EventType literal and exports - Modified AgentBase.init_state to always emit SecurityAnalyzerConfigurationEvent - Added comprehensive tests for event creation and emission - Event tracks analyzer type (string name or None) and includes visualization methods Co-authored-by: openhands <openhands@all-hands.dev>
- Test all 5 required scenarios with parameterized tests - Case 1: LLM analyzer set, security risk passed, extracted properly - Case 2: analyzer not set, security risk passed, extracted properly - Case 3: LLM analyzer set, security risk not passed, ValueError raised - Case 4: analyzer not set, security risk not passed, UNKNOWN returned - Case 5: invalid security risk value passed, ValueError raised - Include additional tests for error messages and argument mutation - Follow existing test patterns and code style guidelines Co-authored-by: openhands <openhands@all-hands.dev>
…ehavior in conversations - Test new conversation initialization creates SystemPromptEvent and SecurityAnalyzerConfigurationEvent - Test reinitializing with same analyzer type creates new events (different instances) - Test reinitializing with same agent instance still creates new events - Test switching between different analyzers creates appropriate events - Test switching from no analyzer to analyzer creates events - Test multiple reinitializations create correct event sequences - Test event properties and methods validation - Use parameterized tests and fixtures following existing patterns - All 8 tests passing with proper edge case coverage Co-authored-by: openhands <openhands@all-hands.dev>
- Updated test_conversation_event_id_validation to expect duplicate event at index 2 instead of 1 - Updated test_pause_basic_functionality to expect 2 events instead of 1 - Both tests now account for the new SecurityAnalyzerConfigurationEvent being added during conversation initialization Co-authored-by: openhands <openhands@all-hands.dev>
…gurationEvent - Updated expected event count to account for additional SecurityAnalyzerConfigurationEvent - When conversation is loaded from persistence, agent initialization adds one more event - Changed assertion from original_event_count to original_event_count + 1 Co-authored-by: openhands <openhands@all-hands.dev>
- Added __eq__ method to SecurityAnalyzerConfigurationEvent to compare only analyzer_type - This prevents duplicate events when the same analyzer configuration is used - Updated test expectations to account for SecurityAnalyzerConfigurationEvent being added during initialization - Fixed test_conversation_event_id_validation to expect duplicate at index 2 (after SystemPromptEvent and SecurityAnalyzerConfigurationEvent) - Fixed test_conversation_persistence_lifecycle to expect same event count when loading from persistence Co-authored-by: openhands <openhands@all-hands.dev>
…//github.com/OpenHands/software-agent-sdk into refactor/always-include-security-risk-fields
This reverts commit 160a6a2.
- Fix _extract_security_risk method to always pop security_risk from arguments before checking readOnlyHint - Update test calls to include the new readOnlyHint parameter (3rd argument) - Add comprehensive test for readOnlyHint=True scenario - Ensure security_risk is properly removed from arguments in all cases Co-authored-by: openhands <openhands@all-hands.dev>
3a8f6bb to
4d58824
Compare
|
QA'ed with CLI and seems to work well |
| if "security_analyzer" in d and d["security_analyzer"]: | ||
| warnings.warn( | ||
| AGENT_SECURITY_ANALYZER_DEPRECATION_WARNING, | ||
| DeprecationWarning, | ||
| stacklevel=3, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OpenHands is there any good python library that allow us to manage deprecation warning in a more systematic way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
I answered your question about Python libraries for managing deprecation warnings systematically. Here are the recommendations:
Top 3 Recommended Libraries:
-
deprecation(Best for your use case)- Lightweight and purpose-built for deprecation management
- Supports version tracking (deprecated_in, removed_in, current_version)
- Integrates seamlessly with Python's warnings module
- Ideal for SDKs and libraries
- Installation:
pip install deprecation
-
Deprecated(deprecated.py)- Feature-rich with excellent Sphinx documentation integration
- Simple decorator syntax with version and reason tracking
- Very popular (4M+ downloads/month)
- Installation:
pip install Deprecated
-
debtcollector(From OpenStack)- Comprehensive solution for tracking both deprecations and technical debt
- Advanced features like move notifications and removal tracking
- Battle-tested in large projects
- Installation:
pip install debtcollector
Why deprecation is recommended for your codebase:
Looking at your current usage (multiple warnings.warn calls in agent/base.py, llm/llm.py, and conversation/conversation_stats.py), the deprecation library would provide:
- Version tracking for deprecation timelines
- Minimal overhead and dependencies
- Easy migration from your current
warningsusage - Clear, structured messaging
- SDK-friendly implementation
All three libraries provide decorators that make deprecation management more systematic and maintainable than the built-in warnings module alone.
No code changes were made as this was an informational question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OpenHands please add a test to tests/cross/test_remote_conversation_live_server.py where we spawns a real agent server to test this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
I've successfully added an integration test to tests/cross/test_remote_conversation_live_server.py that spawns a real agent server to test the security_risk field functionality introduced in PR #1052.
Changes Made
Added test: test_security_risk_field_with_live_server (~148 lines)
- Spawns a real agent server via the existing
server_envfixture - Uses a monkeypatched LLM completion that returns tool calls without the
security_riskfield - Validates that:
- Actions execute successfully even when
security_riskis missing (defaults to UNKNOWN) - ActionEvent objects always have a
security_riskattribute
- Actions execute successfully even when
Test Details
The test:
- Creates an agent with
LLMSecurityAnalyzerconfigured - Mocks LLM completions to return a
finishtool call without providing thesecurity_riskparameter - Verifies the action executes successfully (demonstrating the graceful default behavior)
- Confirms that the resulting ActionEvent has the
security_riskattribute
Validation
✅ Test passes successfully
✅ Pre-commit checks pass (ruff format, ruff lint, pycodestyle, pyright)
✅ Changes committed and pushed to the refactor/always-include-security-risk-fields branch
This integration test provides end-to-end validation of the security_risk field handling changes, ensuring the system gracefully handles cases where the field is missing while still maintaining backward compatibility.
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
xingyaoww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Will merge this after tests pass
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 27s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 12s | $0.01 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 10s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ❌ FAIL (exit: 1) | 35s | $0.00 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 13s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 38s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 42s | $0.03 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 14s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 33s | $0.02 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 202s | $0.35 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 21s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 19s | $0.01 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 20s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 19s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 17s | $0.01 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 15s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 46s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 221s | $0.23 |
| 01_standalone_sdk/25_agent_delegation.py | ❌ FAIL (exit: 1) | 67s | $0.00 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 24s | $0.00N/A |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 69s | $0.06 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 136s | $0.07 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 67s | $0.05 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 90s | $0.02 |
❌ Some tests failed
Total: 24 | Passed: 22 | Failed: 2
🧪 Integration Tests ResultsOverall Success Rate: 93.8% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_gpt_5_mini_2025_08_07
litellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_deepseek_deepseek_chat
Skipped Tests:
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
|
This test validates the fix for issue #819 where security_risk field handling was inconsistent. It spawns a real agent server and tests that: 1. Actions can be executed without security_risk (defaults to UNKNOWN) 2. ActionEvent always has a security_risk attribute The test uses a monkeypatched LLM completion to return a finish tool call without the security_risk field, ensuring the system handles it gracefully. Co-authored-by: openhands <openhands@all-hands.dev>
xingyaoww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary
This PR does the following
security_riskfieldsecurity_riskfield at runtime.security_riskis missing -> send an error event to correct itself (good for strong models)security_riskis missing -> default toSecurityRisk.UNKNOWNand proceed as usuallyBenefits
security_riskprovides the desired flexibilitysecurity_riskfield and they can be ignored depending on whether the analyzer has been configured)The linked issue contains the following scenarios; this PR helps address each one of them
In this scenario, the
security_riskparameter will still exist. The LLM can continue passing the field. We simply won't interfere with the control loop even though the agent is passing risk values. The agent can also forget to pass a risk parameter which will be ignored.This time
security_riskinformation in the system prompt will always exist at the start of a conversation. The LLM is always aware of it. If analyzer is not configured and the LLM forgets to pass a value, we'll default toSecurityRisk.UNKNOWNand proceed as usual. When security analyzer is added later in the conversation, we'll make sure that at runtime an error event is raised if the LLM fails to pass the risk parameter.The schema should always contain
security_risk, so loading back events regardless of whether a section of conversation had or did not have a security analyzer should work.Fixes #819
@malhotra5 can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:8aaca4b-pythonRun
All tags pushed for this build
About Multi-Architecture Support
8aaca4b-python) is a multi-arch manifest supporting both amd64 and arm648aaca4b-python-amd64) are also available if needed