Skip to content

Commit 384c802

Browse files
author
matdev83
committed
Manual Merge PR #629. Add model URI params. Fix retry handling after failed file edits. Improved MCP argument parsing
1 parent e00f40a commit 384c802

16 files changed

+1480
-932
lines changed

data/test_suite_state.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
{
2-
"test_count": 5094,
2+
"test_count": 5099,
33
"last_updated": "1762168167.0802596"
44
}

docs/zai-max-tokens-implementation.md

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -2,33 +2,33 @@
22

33
## Overview
44

5-
Both ZAI connectors (`zai` and `zai-coding-plan`) now enforce a 128K (131,072 tokens) maximum output limit as specified by the ZAI API provider.
5+
Both ZAI connectors (`zai` and `zai-coding-plan`) now enforce a 200K (200,000 tokens) maximum output limit as specified by the ZAI API provider.
66

77
## Implementation Details
88

99
### Default Behavior
10-
- **Default max_tokens**: 131,072 (128K)
11-
- This is the maximum supported by ZAI's backend models
12-
- Used when client doesn't explicitly specify max_tokens or provides invalid values (None, 0, negative)
10+
- **Default max_tokens**: 200,000 (200K)
11+
- This is the maximum supported by ZAI's backend models
12+
- Used when client doesn't explicitly specify max_tokens or provides invalid values (None, 0, negative)
1313

1414
### Client Override Rules
1515
Clients can override the default by explicitly setting `max_tokens` in their request:
1616

17-
1. **Valid Range**: 1,024 to 131,072 tokens
17+
1. **Valid Range**: 1,024 to 200,000 tokens
1818
- Values below 1K are clamped to 1,024
19-
- Values above 128K are clamped to 131,072
19+
- Values above 200K are clamped to 200,000
2020
- Values within range are preserved as-is
2121

2222
2. **Invalid Values**: None, 0, or negative numbers
23-
- Automatically use the 128K default
23+
- Automatically use the 200K default
2424
- Ensures requests never fail due to missing/invalid max_tokens
2525

2626
### Code Locations
2727

28-
#### ZaiCodingPlanBackend
29-
- File: `src/connectors/zai_coding_plan.py`
30-
- Method: `_prepare_payload()`
31-
- Inherits from: `OpenAIConnector`
28+
#### ZaiCodingPlanBackend
29+
- File: `src/connectors/zai_coding_plan.py`
30+
- Method: `_prepare_payload()`
31+
- Inherits from: `OpenAIConnector`
3232

3333
#### ZAIConnector
3434
- File: `src/connectors/zai.py`
@@ -38,19 +38,19 @@ Clients can override the default by explicitly setting `max_tokens` in their req
3838
## Examples
3939

4040
### Example 1: No max_tokens specified
41-
```python
42-
request = {
43-
"model": "zai-coding-plan:glm-4.6",
44-
"messages": [{"role": "user", "content": "Hello"}],
45-
# max_tokens not specified
46-
}
47-
# Result: max_tokens = 131072 (128K)
48-
```
41+
```python
42+
request = {
43+
"model": "zai-coding-plan:glm-4.6",
44+
"messages": [{"role": "user", "content": "Hello"}],
45+
# max_tokens not specified
46+
}
47+
# Result: max_tokens = 200000 (200K)
48+
```
4949

5050
### Example 2: Explicit valid value
5151
```python
5252
request = {
53-
"model": "zai-coding-plan:glm-4.6",
53+
"model": "zai-coding-plan:glm-4.6",
5454
"messages": [{"role": "user", "content": "Hello"}],
5555
"max_tokens": 4096
5656
}
@@ -60,7 +60,7 @@ request = {
6060
### Example 3: Value below minimum
6161
```python
6262
request = {
63-
"model": "zai-coding-plan:glm-4.6",
63+
"model": "zai-coding-plan:glm-4.6",
6464
"messages": [{"role": "user", "content": "Hello"}],
6565
"max_tokens": 512
6666
}
@@ -70,11 +70,11 @@ request = {
7070
### Example 4: Value above maximum
7171
```python
7272
request = {
73-
"model": "zai-coding-plan:glm-4.6",
73+
"model": "zai-coding-plan:glm-4.6",
7474
"messages": [{"role": "user", "content": "Hello"}],
7575
"max_tokens": 200000
7676
}
77-
# Result: max_tokens = 131072 (clamped to maximum)
77+
# Result: max_tokens = 200000 (clamped to maximum)
7878
```
7979

8080
## Testing
@@ -91,7 +91,7 @@ All tests pass successfully.
9191
## Benefits
9292

9393
1. **Prevents 422 Errors**: Ensures max_tokens is always valid
94-
2. **Maximizes Output**: Uses 128K by default for agentic coding tasks
94+
2. **Maximizes Output**: Uses 200K by default for agentic coding tasks
9595
3. **Client Control**: Allows explicit override within valid range
9696
4. **Robust**: Handles edge cases (None, 0, negative, out-of-range)
9797
5. **Consistent**: Same logic across both ZAI connectors

src/connectors/gemini.py

Lines changed: 98 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313

1414
from src.connectors.base import LLMBackend
1515
from src.core.common.exceptions import (
16+
AuthenticationError,
1617
BackendError,
1718
ServiceUnavailableError,
1819
)
@@ -415,7 +416,7 @@ async def chat_completions( # type: ignore[override]
415416
effective_model: str,
416417
identity: IAppIdentityConfig | None = None,
417418
openrouter_api_base_url: str | None = None,
418-
openrouter_headers_provider: Callable[[str, str], dict[str, str]] | None = None,
419+
openrouter_headers_provider: Callable[[Any, str], dict[str, str]] | None = None,
419420
key_name: str | None = None,
420421
api_key: str | None = None,
421422
project: str | None = None,
@@ -425,7 +426,12 @@ async def chat_completions( # type: ignore[override]
425426
) -> ResponseEnvelope | StreamingResponseEnvelope:
426427
# Resolve base configuration
427428
base_api_url, headers = await self._resolve_gemini_api_config(
428-
gemini_api_base_url, openrouter_api_base_url, api_key, **kwargs
429+
gemini_api_base_url,
430+
openrouter_api_base_url,
431+
api_key,
432+
openrouter_headers_provider=openrouter_headers_provider,
433+
key_name=key_name,
434+
**kwargs,
429435
)
430436
if identity:
431437
headers.update(identity.get_resolved_headers(None))
@@ -530,11 +536,31 @@ async def chat_completions( # type: ignore[override]
530536
model_url, payload, headers, effective_model
531537
)
532538

539+
def _build_openrouter_header_context(self) -> dict[str, str]:
540+
referer = "http://localhost:8000"
541+
title = "InterceptorProxy"
542+
543+
identity = getattr(self.config, "identity", None)
544+
if identity is not None:
545+
referer = (
546+
getattr(getattr(identity, "url", None), "default_value", referer)
547+
or referer
548+
)
549+
title = (
550+
getattr(getattr(identity, "title", None), "default_value", title)
551+
or title
552+
)
553+
554+
return {"app_site_url": referer, "app_x_title": title}
555+
533556
async def _resolve_gemini_api_config(
534557
self,
535558
gemini_api_base_url: str | None,
536559
openrouter_api_base_url: str | None,
537560
api_key: str | None,
561+
*,
562+
openrouter_headers_provider: Callable[[Any, str], dict[str, str]] | None = None,
563+
key_name: str | None = None,
538564
**kwargs: Any,
539565
) -> tuple[str, dict[str, str]]:
540566
# Prefer explicit params, then kwargs, then instance attributes set during initialize
@@ -550,12 +576,77 @@ async def _resolve_gemini_api_config(
550576
status_code=500,
551577
detail="Gemini API base URL and API key must be provided.",
552578
)
553-
key_name_to_use = (
554-
kwargs.get("key_name")
555-
or getattr(self, "key_name", None)
556-
or "x-goog-api-key"
579+
normalized_base = base.rstrip("/")
580+
581+
# Only use OpenRouter mode if the chosen base is actually OpenRouter
582+
# OpenRouter mode should only be enabled when the resolved base URL is different
583+
# from the default Gemini API base URL, indicating we're actually routing to OpenRouter
584+
gemini_default_base = "https://generativelanguage.googleapis.com"
585+
using_openrouter = (
586+
openrouter_api_base_url is not None
587+
and normalized_base != gemini_default_base.rstrip("/")
557588
)
558-
return base.rstrip("/"), ensure_loop_guard_header({key_name_to_use: key})
589+
590+
headers: dict[str, str]
591+
if using_openrouter:
592+
headers = {}
593+
provided_headers: dict[str, str] | None = None
594+
595+
if openrouter_headers_provider is not None:
596+
errors: list[Exception] = []
597+
598+
if key_name is not None:
599+
try:
600+
candidate = openrouter_headers_provider(key_name, key)
601+
except (AttributeError, TypeError) as exc:
602+
errors.append(exc)
603+
else:
604+
if candidate:
605+
provided_headers = dict(candidate)
606+
607+
if provided_headers is None:
608+
context = self._build_openrouter_header_context()
609+
try:
610+
candidate = openrouter_headers_provider(context, key)
611+
except Exception as exc: # pragma: no cover - defensive guard
612+
if errors and logger.isEnabledFor(logging.DEBUG):
613+
logger.debug(
614+
"OpenRouter headers provider rejected key_name input: %s",
615+
errors[-1],
616+
exc_info=True,
617+
)
618+
raise AuthenticationError(
619+
message="OpenRouter headers provider failed to produce headers.",
620+
code="missing_credentials",
621+
) from exc
622+
else:
623+
provided_headers = dict(candidate)
624+
625+
if provided_headers is None:
626+
context = self._build_openrouter_header_context()
627+
provided_headers = {
628+
"Authorization": f"Bearer {key}",
629+
"Content-Type": "application/json",
630+
"HTTP-Referer": context["app_site_url"],
631+
"X-Title": context["app_x_title"],
632+
}
633+
634+
headers.update(provided_headers)
635+
context = self._build_openrouter_header_context()
636+
headers.setdefault("Authorization", f"Bearer {key}")
637+
headers.setdefault("Content-Type", "application/json")
638+
headers.setdefault("HTTP-Referer", context["app_site_url"])
639+
headers.setdefault("X-Title", context["app_x_title"])
640+
else:
641+
key_name_to_use = (
642+
key_name
643+
or kwargs.get("key_name")
644+
or getattr(self, "key_name", None)
645+
or "x-goog-api-key"
646+
)
647+
headers = {key_name_to_use: key}
648+
649+
return normalized_base, ensure_loop_guard_header(headers)
559650

560651
def _apply_generation_config(
561652
self, payload: dict[str, Any], request_data: ChatRequest

src/connectors/qwen_oauth.py

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,44 @@ def _launch_cli_refresh_process(self) -> None:
238238
exc_info=True,
239239
)
240240

241+
async def _prepare_payload(
242+
self,
243+
request_data: Any,
244+
processed_messages: list[Any],
245+
effective_model: str,
246+
) -> dict[str, Any]:
247+
"""Ensure sampling parameters are forwarded to the Qwen API payload."""
248+
249+
payload = await super()._prepare_payload(
250+
request_data, processed_messages, effective_model
251+
)
252+
253+
def _extract_param(name: str) -> Any | None:
254+
value = getattr(request_data, name, None)
255+
if value is None and isinstance(request_data, dict):
256+
value = request_data.get(name)
257+
if value is None:
258+
extra_body = getattr(request_data, "extra_body", None)
259+
if isinstance(extra_body, dict):
260+
value = extra_body.get(name)
261+
return value
262+
263+
top_p = _extract_param("top_p")
264+
if top_p is not None:
265+
try:
266+
payload["top_p"] = float(top_p)
267+
except (TypeError, ValueError):
268+
logger.debug("Ignoring non-numeric top_p value: %r", top_p)
269+
270+
top_k = _extract_param("top_k")
271+
if top_k is not None:
272+
try:
273+
payload["top_k"] = int(top_k)
274+
except (TypeError, ValueError):
275+
logger.debug("Ignoring non-integer top_k value: %r", top_k)
276+
277+
return payload
278+
241279
async def _poll_for_new_token(self, max_wait_seconds: float | None = None) -> bool:
242280
"""Poll the credential file for an updated token after CLI refresh."""
243281
if not self._is_token_expired():

src/connectors/zai.py

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
ZAI connector for Zhipu AI's GLM models
33
"""
44

5+
import logging
56
from pathlib import Path
67
from typing import TYPE_CHECKING, Any
78

@@ -21,6 +22,9 @@
2122
from src.core.services.translation_service import TranslationService
2223

2324

25+
logger = logging.getLogger(__name__)
26+
27+
2428
class ZAIConnector(OpenAIConnector):
2529
"""ZAI backend connector for Zhipu AI's GLM models."""
2630

@@ -134,35 +138,59 @@ async def _prepare_payload(
134138
effective_model: str,
135139
) -> dict[str, Any]:
136140
"""
137-
Prepare payload for ZAI backend with 128K max_tokens support.
141+
Prepare payload for ZAI backend with 200K max_tokens support.
138142
139-
ZAI backend supports up to 128K output tokens. This method ensures
143+
ZAI backend supports up to 200K output tokens. This method ensures
140144
max_tokens is set appropriately based on client request.
141145
"""
142146
payload = await super()._prepare_payload(
143147
request_data, processed_messages, effective_model
144148
)
145149

150+
def _extract_param(name: str) -> Any | None:
151+
value = getattr(request_data, name, None)
152+
if value is None and isinstance(request_data, dict):
153+
value = request_data.get(name)
154+
if value is None:
155+
extra_body = getattr(request_data, "extra_body", None)
156+
if isinstance(extra_body, dict):
157+
value = extra_body.get(name)
158+
return value
159+
160+
top_p = _extract_param("top_p")
161+
if top_p is not None:
162+
try:
163+
payload["top_p"] = float(top_p)
164+
except (TypeError, ValueError):
165+
logger.debug("Ignoring non-numeric top_p value for ZAI: %r", top_p)
166+
167+
top_k = _extract_param("top_k")
168+
if top_k is not None:
169+
try:
170+
payload["top_k"] = int(top_k)
171+
except (TypeError, ValueError):
172+
logger.debug("Ignoring non-integer top_k value for ZAI: %r", top_k)
173+
146174
# ZAI currently breaks tool calling when reasoning is enabled. The upstream
147175
# service does not support the OpenAI reasoning payload, so strip any
148176
# client-specified reasoning configuration before sending the request.
149177
payload.pop("reasoning", None)
150178
payload.pop("reasoning_effort", None)
151179

152-
# ZAI backend supports up to 128K output tokens
180+
# ZAI backend supports up to 200K output tokens
153181
# Override max_tokens only if client explicitly set a valid positive value
154182
requested_max_tokens = getattr(request_data, "max_tokens", None)
155183

156184
if requested_max_tokens is not None and requested_max_tokens > 0:
157185
# Client explicitly requested a value - validate and clamp to valid range
158186
# Only enforce maximum limit, allow any positive value as minimum
159-
if requested_max_tokens > 131072: # 128K
160-
payload["max_tokens"] = 131072
187+
if requested_max_tokens > 200000: # 200K
188+
payload["max_tokens"] = 200000
161189
else:
162190
payload["max_tokens"] = requested_max_tokens
163191
else:
164192
# No explicit request or invalid value (None, 0, negative) - use ZAI's max
165-
payload["max_tokens"] = 131072 # 128K default for ZAI
193+
payload["max_tokens"] = 200000 # 200K default for ZAI
166194

167195
return payload
168196

0 commit comments

Comments
 (0)