Skip to content

Commit d51a383

Browse files
author
matdev83
committed
feat: enhance hybrid backend repeat messages and edit precision integration
- Add HYBRID_BACKEND_REPEAT_MESSAGES environment variable and config option - Implement temporary reasoning probability override from edit precision middleware - Add hybrid reasoning disable mechanism after edit failures - Update configuration examples and sample environment files - Add comprehensive test coverage for hybrid reasoning override functionality - Improve edit precision middleware with hybrid reasoning integration - Add wire capture sample for debugging and analysis - Update test suite state and changelog formatting
1 parent aadcd38 commit d51a383

File tree

11 files changed

+1785
-1512
lines changed

11 files changed

+1785
-1512
lines changed

CHANGELOG.md

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,6 @@
22

33
## [2025-11-04]
44

5-
### Added
6-
75
- **Hybrid Backend Repeat Messages Feature**: New configuration option to repeat reasoning output as an artificial message in the session
86
- **New Configuration Option**: `--hybrid-backend-repeat-messages` CLI flag and `HYBRID_BACKEND_REPEAT_MESSAGES` environment variable to enable the feature
97
- **Artificial Message Injection**: When enabled, reasoning output is added as an artificial assistant message in the conversation history
@@ -23,7 +21,6 @@
2321

2422
## [2025-10-31]
2523

26-
2724
- **XML Tool Call Format Support**: Added support for XML tool call format in ToolCallRepairService
2825
- XML pattern detection and parsing for Kilo MCP tools
2926
- Support for both direct XML tool format and use_mcp_tool wrapper format
@@ -44,8 +41,6 @@
4441

4542
## [2025-10-23]
4643

47-
### Added
48-
4944
- **Intelligent Session Management**: Autonomous session continuity detection via message history fingerprinting
5045
- **Context Loss Prevention**: Eliminates session loss for stateless clients (e.g., Kilo Code, Cursor) that don't send session IDs
5146
- **Message History Fingerprinting**: Computes stable hashes from conversation sequences to detect continuity
@@ -65,8 +60,6 @@
6560

6661
## [2025-01-21]
6762

68-
### Added
69-
7063
- **LLM Assessment System**: Intelligent conversation quality monitoring inspired by Google's gemini-cli
7164
- Automatically detects unproductive patterns like repetitive tool calls and cognitive loops
7265
- Event-driven assessment triggers after configurable turn thresholds (default: 30 turns)
@@ -77,7 +70,7 @@
7770
- Graceful degradation - assessment failures never break main conversation flow
7871
- Complete documentation in README.md with configuration examples and use cases
7972

80-
# 2025-10-17 - Gemini OAuth Backend Refactoring
73+
## 2025-10-17 - Gemini OAuth Backend Refactoring
8174

8275
- **Refactor**: Split `gemini-oauth-personal` backend into two specialized backends for different use cases
8376
- **New Backend**: `gemini-oauth-free` for free-tier Gemini API usage with appropriate quotas and limits
@@ -86,7 +79,7 @@
8679
- **Migration**: Existing configurations automatically redirect to appropriate backend based on authentication type
8780
- **Testing**: Comprehensive test suites created for both new backends with full coverage of OAuth flows and API interactions
8881

89-
# 2025-10-16 - Command Pipeline Policy & Regression Coverage
82+
## 2025-10-16 - Command Pipeline Policy & Regression Coverage
9083

9184
- **Dependency Injection**: Command services now require explicit `ICommandPolicyService`
9285
and `ICommandStateService` instances. `CommandStage` wires the policy/state helpers,
@@ -288,8 +281,6 @@
288281

289282
- **Maintenance**: Various code quality improvements including import organization, unused import removal, and code formatting consistency
290283

291-
# Changelog
292-
293284
## 2025-10-01 - Refactor: Translation Service and Gemini Request Counting
294285

295286
- **Refactor**: Centralized all request/response translation logic into a new `TranslationService` (`src/core/services/translation_service.py`). This improves modularity, simplifies maintenance, and makes it easier to add new API formats.

config/config.example.yaml

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@ logging:
9494

9595
backends:
9696
default_backend: "openai"
97+
# hybrid_backend_repeat_messages: false # Set to true to repeat reasoning output as an artificial message in the session
9798

9899
openai:
99100
# API key set via OPENROUTER_API_KEY environment variable
@@ -114,7 +115,7 @@ backends:
114115
timeout: 150
115116

116117
gemini:
117-
# API key set via GEMINI_API_KEY environment variable
118+
# GEMINI_API_KEY environment variable
118119
timeout: 120
119120

120121
qwen_oauth:
@@ -140,10 +141,10 @@ model_defaults:
140141
# Failover routes
141142
failover_routes:
142143
default:
143-
policy: "ordered"
144-
elements:
145-
- "openai:gpt-4"
146-
- "openrouter:anthropic/claude-3-opus-20240229"
144+
policy: "ordered"
145+
elements:
146+
- "openai:gpt-4"
147+
- "openrouter:anthropic/claude-3-opus-20240229"
147148

148149
# Model name rewrite rules (optional)
149150
# These rules allow you to dynamically rewrite model names before they are processed

config/sample.env

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ TOOL_CALL_REPAIR_BUFFER_CAP_BYTES=65536
4949
FORCE_REPROCESS_TOOL_CALLS=false
5050
# Log when tool calls are skipped (useful for visibility during development, default: false)
5151
# When enabled, logs will show which messages are being skipped to help understand the optimization
52+
# Enable hybrid backend to repeat reasoning messages as artificial messages in the session (default: false)
53+
HYBRID_BACKEND_REPEAT_MESSAGES=false
5254
LOG_SKIPPED_TOOL_CALLS=false
5355

5456
# Loop Detection Settings

data/test_suite_state.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
{
2-
"test_count": 5057,
2+
"test_count": 5058,
33
"last_updated": "1762168167.0802596"
44
}

sample_wire_capture.jsonl

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{"timestamp_iso": "2025-01-10T15:58:41.039145+00:00", "timestamp_unix": 1736524721.039145, "direction": "inbound_response", "source": "127.0.0.1(Cline/1.0)", "destination": "qwen-oauth", "session_id": "session-abc123", "backend": "qwen-oauth", "model": "qwen3-coder-plus", "content_type": "json", "content_length": 1247, "payload": {"choices": [{"message": {"content": "Failed to edit, could not find the string to replace in the file. The SEARCH block content doesn't match exactly."}}], "usage": {"total_tokens": 150}}, "metadata": {"client_host": "127.0.0.1", "user_agent": "Cline/1.0", "request_id": "req_abc123"}}
2+
{"timestamp_iso": "2025-01-10T15:58:42.156234+00:00", "timestamp_unix": 1736524722.156234, "direction": "outbound_request", "source": "127.0.0.1(Cline/1.0)", "destination": "openai", "session_id": "session-abc123", "backend": "openai", "model": "gpt-4o", "content_type": "json", "content_length": 892, "payload": {"messages": [{"role": "user", "content": "I need to make changes to this file but the previous edit failed"}], "temperature": 0.1, "top_p": 0.3}, "metadata": {"client_host": "127.0.0.1", "user_agent": "Cline/1.0", "request_id": "req_abc124", "_edit_precision_mode": true, "_edit_precision_meta": {"original_temperature": 0.7, "original_top_p": 0.8, "applied_temperature": 0.1, "applied_top_p": 0.3}}}
3+
{"timestamp_iso": "2025-01-10T15:59:10.201456+00:00", "timestamp_unix": 1736524750.201456, "direction": "inbound_response", "source": "127.0.0.1(Cline/1.0)", "destination": "openai", "session_id": "session-abc123", "backend": "openai", "model": "gpt-4o", "content_type": "json", "content_length": 956, "payload": {"choices": [{"message": {"content": "Error: [patch_file] Error - old_string not found in content", "tool_calls": [{"function": {"name": "patch_file", "arguments": "{\"path\": \"example.py\", \"old_string\": \"def old_func():\", \"new_string\": \"def new_func():\"}", "status": "error"}}]}}], "usage": {"total_tokens": 200}}, "metadata": {"client_host": "127.0.0.1", "user_agent": "Cline/1.0", "request_id": "req_abc125"}}
4+
{"timestamp_iso": "2025-01-10T15:59:11.892789+00:00", "timestamp_unix": 1736524751.892789, "direction": "outbound_request", "source": "127.0.0.1(Cline/1.0)", "destination": "anthropic", "session_id": "session-abc123", "backend": "anthropic", "model": "claude-3-5-sonnet", "content_type": "json", "content_length": 1023, "payload": {"messages": [{"role": "user", "content": "I need to fix the previous edit that failed"}], "temperature": 0.1, "top_p": 0.3}, "metadata": {"client_host": "127.0.0.1", "user_agent": "Cline/1.0", "request_id": "req_abc126", "_edit_precision_mode": true, "_edit_precision_meta": {"original_temperature": 0.5, "original_top_p": 0.7, "applied_temperature": 0.1, "applied_top_p": 0.3}}}

src/connectors/hybrid.py

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1833,12 +1833,33 @@ async def chat_completions(
18331833
has_reasoning_content = False
18341834
reasoning_time = 0.0
18351835

1836-
# Decide whether to use the reasoning model
1837-
use_reasoning = (
1838-
random.random() < self.config.backends.reasoning_injection_probability
1839-
)
1836+
# Check for temporary reasoning injection probability override from edit precision middleware
1837+
temp_reasoning_probability = None
1838+
if isinstance(request_data, dict):
1839+
extra_body = request_data.get("extra_body", {})
1840+
else:
1841+
extra_body = getattr(request_data, "extra_body", {})
1842+
if extra_body is None:
1843+
extra_body = {}
1844+
1845+
# Check if edit precision middleware has set a temporary override
1846+
temp_prob_override = extra_body.get("_temp_hybrid_reasoning_probability")
1847+
if temp_prob_override is not None:
1848+
temp_reasoning_probability = float(temp_prob_override)
1849+
# Log that we're using a temporary override
1850+
logger.info(
1851+
f"Using temporary reasoning injection probability override: {temp_reasoning_probability} for session",
1852+
extra={"session_id": session_id},
1853+
)
1854+
else:
1855+
temp_reasoning_probability = (
1856+
self.config.backends.reasoning_injection_probability
1857+
)
1858+
1859+
# Decide whether to use the reasoning model with the (potentially overridden) probability
1860+
use_reasoning = random.random() < temp_reasoning_probability
18401861
logger.info(
1841-
f"Reasoning model injection decision: {'USE' if use_reasoning else 'SKIP'}"
1862+
f"Reasoning model injection decision: {'USE' if use_reasoning else 'SKIP'}, probability={temp_reasoning_probability}"
18421863
)
18431864

18441865
if use_reasoning:

0 commit comments

Comments
 (0)