Skip to content

Commit ed9e2dd

Browse files
feat(api): Realtime API token_limits, Hybrid searching ranking options
1 parent 0393d90 commit ed9e2dd

19 files changed

+273
-59
lines changed

.stats.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
configured_endpoints: 136
2-
openapi_spec_url: https://storage.googleapis.com/stainless-sdk-openapi-specs/openai%2Fopenai-f68f718cd45ac3f9336603601bccc38a718af44d0b26601031de3d0a71b7ce2f.yml
3-
openapi_spec_hash: 1560717860bba4105936647dde8f618d
4-
config_hash: 50ee3382a63c021a9f821a935950e926
2+
openapi_spec_url: https://storage.googleapis.com/stainless-sdk-openapi-specs/openai%2Fopenai-3c5d1593d7c6f2b38a7d78d7906041465ee9d6e9022f0651e1da194654488108.yml
3+
openapi_spec_hash: 0a4d8ad2469823ce24a3fd94f23f1c2b
4+
config_hash: 032995825500a503a76da119f5354905

src/openai/resources/images.py

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,10 @@ def edit(
168168
If `transparent`, the output format needs to support transparency, so it should
169169
be set to either `png` (default value) or `webp`.
170170
171-
input_fidelity: Control how much effort the model will exert to match the style and features, especially facial features, of input images. This parameter is only supported for `gpt-image-1`. Unsupported for `gpt-image-1-mini`. Supports `high` and `low`. Defaults to `low`.
171+
input_fidelity: Control how much effort the model will exert to match the style and features,
172+
especially facial features, of input images. This parameter is only supported
173+
for `gpt-image-1`. Unsupported for `gpt-image-1-mini`. Supports `high` and
174+
`low`. Defaults to `low`.
172175
173176
mask: An additional image whose fully transparent areas (e.g. where alpha is zero)
174177
indicate where `image` should be edited. If there are multiple images provided,
@@ -282,7 +285,10 @@ def edit(
282285
If `transparent`, the output format needs to support transparency, so it should
283286
be set to either `png` (default value) or `webp`.
284287
285-
input_fidelity: Control how much effort the model will exert to match the style and features, especially facial features, of input images. This parameter is only supported for `gpt-image-1`. Unsupported for `gpt-image-1-mini`. Supports `high` and `low`. Defaults to `low`.
288+
input_fidelity: Control how much effort the model will exert to match the style and features,
289+
especially facial features, of input images. This parameter is only supported
290+
for `gpt-image-1`. Unsupported for `gpt-image-1-mini`. Supports `high` and
291+
`low`. Defaults to `low`.
286292
287293
mask: An additional image whose fully transparent areas (e.g. where alpha is zero)
288294
indicate where `image` should be edited. If there are multiple images provided,
@@ -392,7 +398,10 @@ def edit(
392398
If `transparent`, the output format needs to support transparency, so it should
393399
be set to either `png` (default value) or `webp`.
394400
395-
input_fidelity: Control how much effort the model will exert to match the style and features, especially facial features, of input images. This parameter is only supported for `gpt-image-1`. Unsupported for `gpt-image-1-mini`. Supports `high` and `low`. Defaults to `low`.
401+
input_fidelity: Control how much effort the model will exert to match the style and features,
402+
especially facial features, of input images. This parameter is only supported
403+
for `gpt-image-1`. Unsupported for `gpt-image-1-mini`. Supports `high` and
404+
`low`. Defaults to `low`.
396405
397406
mask: An additional image whose fully transparent areas (e.g. where alpha is zero)
398407
indicate where `image` should be edited. If there are multiple images provided,
@@ -1046,7 +1055,10 @@ async def edit(
10461055
If `transparent`, the output format needs to support transparency, so it should
10471056
be set to either `png` (default value) or `webp`.
10481057
1049-
input_fidelity: Control how much effort the model will exert to match the style and features, especially facial features, of input images. This parameter is only supported for `gpt-image-1`. Unsupported for `gpt-image-1-mini`. Supports `high` and `low`. Defaults to `low`.
1058+
input_fidelity: Control how much effort the model will exert to match the style and features,
1059+
especially facial features, of input images. This parameter is only supported
1060+
for `gpt-image-1`. Unsupported for `gpt-image-1-mini`. Supports `high` and
1061+
`low`. Defaults to `low`.
10501062
10511063
mask: An additional image whose fully transparent areas (e.g. where alpha is zero)
10521064
indicate where `image` should be edited. If there are multiple images provided,
@@ -1160,7 +1172,10 @@ async def edit(
11601172
If `transparent`, the output format needs to support transparency, so it should
11611173
be set to either `png` (default value) or `webp`.
11621174
1163-
input_fidelity: Control how much effort the model will exert to match the style and features, especially facial features, of input images. This parameter is only supported for `gpt-image-1`. Unsupported for `gpt-image-1-mini`. Supports `high` and `low`. Defaults to `low`.
1175+
input_fidelity: Control how much effort the model will exert to match the style and features,
1176+
especially facial features, of input images. This parameter is only supported
1177+
for `gpt-image-1`. Unsupported for `gpt-image-1-mini`. Supports `high` and
1178+
`low`. Defaults to `low`.
11641179
11651180
mask: An additional image whose fully transparent areas (e.g. where alpha is zero)
11661181
indicate where `image` should be edited. If there are multiple images provided,
@@ -1270,7 +1285,10 @@ async def edit(
12701285
If `transparent`, the output format needs to support transparency, so it should
12711286
be set to either `png` (default value) or `webp`.
12721287
1273-
input_fidelity: Control how much effort the model will exert to match the style and features, especially facial features, of input images. This parameter is only supported for `gpt-image-1`. Unsupported for `gpt-image-1-mini`. Supports `high` and `low`. Defaults to `low`.
1288+
input_fidelity: Control how much effort the model will exert to match the style and features,
1289+
especially facial features, of input images. This parameter is only supported
1290+
for `gpt-image-1`. Unsupported for `gpt-image-1-mini`. Supports `high` and
1291+
`low`. Defaults to `low`.
12741292
12751293
mask: An additional image whose fully transparent areas (e.g. where alpha is zero)
12761294
indicate where `image` should be edited. If there are multiple images provided,

src/openai/resources/realtime/calls.py

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -195,8 +195,19 @@ def accept(
195195
`auto` will create a trace for the session with default values for the workflow
196196
name, group id, and metadata.
197197
198-
truncation: Controls how the realtime conversation is truncated prior to model inference.
199-
The default is `auto`.
198+
truncation: When the number of tokens in a conversation exceeds the model's input token
199+
limit, the conversation be truncated, meaning messages (starting from the
200+
oldest) will not be included in the model's context. A 32k context model with
201+
4,096 max output tokens can only include 28,224 tokens in the context before
202+
truncation occurs. Clients can configure truncation behavior to truncate with a
203+
lower max token limit, which is an effective way to control token usage and
204+
cost. Truncation will reduce the number of cached tokens on the next turn
205+
(busting the cache), since messages are dropped from the beginning of the
206+
context. However, clients can also configure truncation to retain messages up to
207+
a fraction of the maximum context size, which will reduce the need for future
208+
truncations and thus improve the cache rate. Truncation can be disabled
209+
entirely, which means the server will never truncate but would instead return an
210+
error if the conversation exceeds the model's input token limit.
200211
201212
extra_headers: Send extra headers
202213
@@ -504,8 +515,19 @@ async def accept(
504515
`auto` will create a trace for the session with default values for the workflow
505516
name, group id, and metadata.
506517
507-
truncation: Controls how the realtime conversation is truncated prior to model inference.
508-
The default is `auto`.
518+
truncation: When the number of tokens in a conversation exceeds the model's input token
519+
limit, the conversation be truncated, meaning messages (starting from the
520+
oldest) will not be included in the model's context. A 32k context model with
521+
4,096 max output tokens can only include 28,224 tokens in the context before
522+
truncation occurs. Clients can configure truncation behavior to truncate with a
523+
lower max token limit, which is an effective way to control token usage and
524+
cost. Truncation will reduce the number of cached tokens on the next turn
525+
(busting the cache), since messages are dropped from the beginning of the
526+
context. However, clients can also configure truncation to retain messages up to
527+
a fraction of the maximum context size, which will reduce the need for future
528+
truncations and thus improve the cache rate. Truncation can be disabled
529+
entirely, which means the server will never truncate but would instead return an
530+
error if the conversation exceeds the model's input token limit.
509531
510532
extra_headers: Send extra headers
511533

src/openai/resources/vector_stores/file_batches.py

Lines changed: 28 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,10 @@ def create(
5252
self,
5353
vector_store_id: str,
5454
*,
55-
file_ids: SequenceNotStr[str],
5655
attributes: Optional[Dict[str, Union[str, float, bool]]] | Omit = omit,
5756
chunking_strategy: FileChunkingStrategyParam | Omit = omit,
57+
file_ids: SequenceNotStr[str] | Omit = omit,
58+
files: Iterable[file_batch_create_params.File] | Omit = omit,
5859
# Use the following arguments if you need to pass additional parameters to the API that aren't available via kwargs.
5960
# The extra values given here take precedence over values defined on the client or passed to this method.
6061
extra_headers: Headers | None = None,
@@ -66,10 +67,6 @@ def create(
6667
Create a vector store file batch.
6768
6869
Args:
69-
file_ids: A list of [File](https://platform.openai.com/docs/api-reference/files) IDs that
70-
the vector store should use. Useful for tools like `file_search` that can access
71-
files.
72-
7370
attributes: Set of 16 key-value pairs that can be attached to an object. This can be useful
7471
for storing additional information about the object in a structured format, and
7572
querying for objects via API or the dashboard. Keys are strings with a maximum
@@ -79,6 +76,16 @@ def create(
7976
chunking_strategy: The chunking strategy used to chunk the file(s). If not set, will use the `auto`
8077
strategy. Only applicable if `file_ids` is non-empty.
8178
79+
file_ids: A list of [File](https://platform.openai.com/docs/api-reference/files) IDs that
80+
the vector store should use. Useful for tools like `file_search` that can access
81+
files. If `attributes` or `chunking_strategy` are provided, they will be applied
82+
to all files in the batch. Mutually exclusive with `files`.
83+
84+
files: A list of objects that each include a `file_id` plus optional `attributes` or
85+
`chunking_strategy`. Use this when you need to override metadata for specific
86+
files. The global `attributes` or `chunking_strategy` will be ignored and must
87+
be specified for each file. Mutually exclusive with `file_ids`.
88+
8289
extra_headers: Send extra headers
8390
8491
extra_query: Add additional query parameters to the request
@@ -94,9 +101,10 @@ def create(
94101
f"/vector_stores/{vector_store_id}/file_batches",
95102
body=maybe_transform(
96103
{
97-
"file_ids": file_ids,
98104
"attributes": attributes,
99105
"chunking_strategy": chunking_strategy,
106+
"file_ids": file_ids,
107+
"files": files,
100108
},
101109
file_batch_create_params.FileBatchCreateParams,
102110
),
@@ -389,9 +397,10 @@ async def create(
389397
self,
390398
vector_store_id: str,
391399
*,
392-
file_ids: SequenceNotStr[str],
393400
attributes: Optional[Dict[str, Union[str, float, bool]]] | Omit = omit,
394401
chunking_strategy: FileChunkingStrategyParam | Omit = omit,
402+
file_ids: SequenceNotStr[str] | Omit = omit,
403+
files: Iterable[file_batch_create_params.File] | Omit = omit,
395404
# Use the following arguments if you need to pass additional parameters to the API that aren't available via kwargs.
396405
# The extra values given here take precedence over values defined on the client or passed to this method.
397406
extra_headers: Headers | None = None,
@@ -403,10 +412,6 @@ async def create(
403412
Create a vector store file batch.
404413
405414
Args:
406-
file_ids: A list of [File](https://platform.openai.com/docs/api-reference/files) IDs that
407-
the vector store should use. Useful for tools like `file_search` that can access
408-
files.
409-
410415
attributes: Set of 16 key-value pairs that can be attached to an object. This can be useful
411416
for storing additional information about the object in a structured format, and
412417
querying for objects via API or the dashboard. Keys are strings with a maximum
@@ -416,6 +421,16 @@ async def create(
416421
chunking_strategy: The chunking strategy used to chunk the file(s). If not set, will use the `auto`
417422
strategy. Only applicable if `file_ids` is non-empty.
418423
424+
file_ids: A list of [File](https://platform.openai.com/docs/api-reference/files) IDs that
425+
the vector store should use. Useful for tools like `file_search` that can access
426+
files. If `attributes` or `chunking_strategy` are provided, they will be applied
427+
to all files in the batch. Mutually exclusive with `files`.
428+
429+
files: A list of objects that each include a `file_id` plus optional `attributes` or
430+
`chunking_strategy`. Use this when you need to override metadata for specific
431+
files. The global `attributes` or `chunking_strategy` will be ignored and must
432+
be specified for each file. Mutually exclusive with `file_ids`.
433+
419434
extra_headers: Send extra headers
420435
421436
extra_query: Add additional query parameters to the request
@@ -431,9 +446,10 @@ async def create(
431446
f"/vector_stores/{vector_store_id}/file_batches",
432447
body=await async_maybe_transform(
433448
{
434-
"file_ids": file_ids,
435449
"attributes": attributes,
436450
"chunking_strategy": chunking_strategy,
451+
"file_ids": file_ids,
452+
"files": files,
437453
},
438454
file_batch_create_params.FileBatchCreateParams,
439455
),

src/openai/types/realtime/call_accept_params.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,17 @@ class CallAcceptParams(TypedDict, total=False):
106106

107107
truncation: RealtimeTruncationParam
108108
"""
109-
Controls how the realtime conversation is truncated prior to model inference.
110-
The default is `auto`.
109+
When the number of tokens in a conversation exceeds the model's input token
110+
limit, the conversation be truncated, meaning messages (starting from the
111+
oldest) will not be included in the model's context. A 32k context model with
112+
4,096 max output tokens can only include 28,224 tokens in the context before
113+
truncation occurs. Clients can configure truncation behavior to truncate with a
114+
lower max token limit, which is an effective way to control token usage and
115+
cost. Truncation will reduce the number of cached tokens on the next turn
116+
(busting the cache), since messages are dropped from the beginning of the
117+
context. However, clients can also configure truncation to retain messages up to
118+
a fraction of the maximum context size, which will reduce the need for future
119+
truncations and thus improve the cache rate. Truncation can be disabled
120+
entirely, which means the server will never truncate but would instead return an
121+
error if the conversation exceeds the model's input token limit.
111122
"""

src/openai/types/realtime/realtime_session_create_request.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,17 @@ class RealtimeSessionCreateRequest(BaseModel):
106106

107107
truncation: Optional[RealtimeTruncation] = None
108108
"""
109-
Controls how the realtime conversation is truncated prior to model inference.
110-
The default is `auto`.
109+
When the number of tokens in a conversation exceeds the model's input token
110+
limit, the conversation be truncated, meaning messages (starting from the
111+
oldest) will not be included in the model's context. A 32k context model with
112+
4,096 max output tokens can only include 28,224 tokens in the context before
113+
truncation occurs. Clients can configure truncation behavior to truncate with a
114+
lower max token limit, which is an effective way to control token usage and
115+
cost. Truncation will reduce the number of cached tokens on the next turn
116+
(busting the cache), since messages are dropped from the beginning of the
117+
context. However, clients can also configure truncation to retain messages up to
118+
a fraction of the maximum context size, which will reduce the need for future
119+
truncations and thus improve the cache rate. Truncation can be disabled
120+
entirely, which means the server will never truncate but would instead return an
121+
error if the conversation exceeds the model's input token limit.
111122
"""

src/openai/types/realtime/realtime_session_create_request_param.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,17 @@ class RealtimeSessionCreateRequestParam(TypedDict, total=False):
106106

107107
truncation: RealtimeTruncationParam
108108
"""
109-
Controls how the realtime conversation is truncated prior to model inference.
110-
The default is `auto`.
109+
When the number of tokens in a conversation exceeds the model's input token
110+
limit, the conversation be truncated, meaning messages (starting from the
111+
oldest) will not be included in the model's context. A 32k context model with
112+
4,096 max output tokens can only include 28,224 tokens in the context before
113+
truncation occurs. Clients can configure truncation behavior to truncate with a
114+
lower max token limit, which is an effective way to control token usage and
115+
cost. Truncation will reduce the number of cached tokens on the next turn
116+
(busting the cache), since messages are dropped from the beginning of the
117+
context. However, clients can also configure truncation to retain messages up to
118+
a fraction of the maximum context size, which will reduce the need for future
119+
truncations and thus improve the cache rate. Truncation can be disabled
120+
entirely, which means the server will never truncate but would instead return an
121+
error if the conversation exceeds the model's input token limit.
111122
"""

0 commit comments

Comments
 (0)