diff --git a/docs/docs/providers/agents/index.mdx b/docs/docs/providers/agents/index.mdx index 200a3b9caa..1f7e0c7888 100644 --- a/docs/docs/providers/agents/index.mdx +++ b/docs/docs/providers/agents/index.mdx @@ -2,7 +2,7 @@ description: | Agents - APIs for creating and interacting with agentic systems. + APIs for creating and interacting with agentic systems. sidebar_label: Agents title: Agents --- @@ -13,6 +13,6 @@ title: Agents Agents - APIs for creating and interacting with agentic systems. +APIs for creating and interacting with agentic systems. This section contains documentation for all available providers for the **agents** API. diff --git a/docs/docs/providers/batches/index.mdx b/docs/docs/providers/batches/index.mdx index 18fd499459..23b7df14b7 100644 --- a/docs/docs/providers/batches/index.mdx +++ b/docs/docs/providers/batches/index.mdx @@ -1,15 +1,15 @@ --- description: | The Batches API enables efficient processing of multiple requests in a single operation, - particularly useful for processing large datasets, batch evaluation workflows, and - cost-effective inference at scale. + particularly useful for processing large datasets, batch evaluation workflows, and + cost-effective inference at scale. - The API is designed to allow use of openai client libraries for seamless integration. + The API is designed to allow use of openai client libraries for seamless integration. - This API provides the following extensions: - - idempotent batch creation + This API provides the following extensions: + - idempotent batch creation - Note: This API is currently under active development and may undergo changes. + Note: This API is currently under active development and may undergo changes. sidebar_label: Batches title: Batches --- @@ -19,14 +19,14 @@ title: Batches ## Overview The Batches API enables efficient processing of multiple requests in a single operation, - particularly useful for processing large datasets, batch evaluation workflows, and - cost-effective inference at scale. +particularly useful for processing large datasets, batch evaluation workflows, and +cost-effective inference at scale. - The API is designed to allow use of openai client libraries for seamless integration. +The API is designed to allow use of openai client libraries for seamless integration. - This API provides the following extensions: - - idempotent batch creation +This API provides the following extensions: + - idempotent batch creation - Note: This API is currently under active development and may undergo changes. +Note: This API is currently under active development and may undergo changes. This section contains documentation for all available providers for the **batches** API. diff --git a/docs/docs/providers/eval/index.mdx b/docs/docs/providers/eval/index.mdx index 3543db2461..a6e35d6118 100644 --- a/docs/docs/providers/eval/index.mdx +++ b/docs/docs/providers/eval/index.mdx @@ -2,7 +2,7 @@ description: | Evaluations - Llama Stack Evaluation API for running evaluations on model and agent candidates. + Llama Stack Evaluation API for running evaluations on model and agent candidates. sidebar_label: Eval title: Eval --- @@ -13,6 +13,6 @@ title: Eval Evaluations - Llama Stack Evaluation API for running evaluations on model and agent candidates. +Llama Stack Evaluation API for running evaluations on model and agent candidates. This section contains documentation for all available providers for the **eval** API. diff --git a/docs/docs/providers/files/index.mdx b/docs/docs/providers/files/index.mdx index 0b28e9aee4..0540c5c3e0 100644 --- a/docs/docs/providers/files/index.mdx +++ b/docs/docs/providers/files/index.mdx @@ -2,7 +2,7 @@ description: | Files - This API is used to upload documents that can be used with other Llama Stack APIs. + This API is used to upload documents that can be used with other Llama Stack APIs. sidebar_label: Files title: Files --- @@ -13,6 +13,6 @@ title: Files Files - This API is used to upload documents that can be used with other Llama Stack APIs. +This API is used to upload documents that can be used with other Llama Stack APIs. This section contains documentation for all available providers for the **files** API. diff --git a/docs/docs/providers/inference/index.mdx b/docs/docs/providers/inference/index.mdx index e2d94bfafd..ad050e5013 100644 --- a/docs/docs/providers/inference/index.mdx +++ b/docs/docs/providers/inference/index.mdx @@ -2,12 +2,12 @@ description: | Inference - Llama Stack Inference API for generating completions, chat completions, and embeddings. + Llama Stack Inference API for generating completions, chat completions, and embeddings. - This API provides the raw interface to the underlying models. Three kinds of models are supported: - - LLM models: these models generate "raw" and "chat" (conversational) completions. - - Embedding models: these models generate embeddings to be used for semantic search. - - Rerank models: these models reorder the documents based on their relevance to a query. + This API provides the raw interface to the underlying models. Three kinds of models are supported: + - LLM models: these models generate "raw" and "chat" (conversational) completions. + - Embedding models: these models generate embeddings to be used for semantic search. + - Rerank models: these models reorder the documents based on their relevance to a query. sidebar_label: Inference title: Inference --- @@ -18,11 +18,11 @@ title: Inference Inference - Llama Stack Inference API for generating completions, chat completions, and embeddings. +Llama Stack Inference API for generating completions, chat completions, and embeddings. - This API provides the raw interface to the underlying models. Three kinds of models are supported: - - LLM models: these models generate "raw" and "chat" (conversational) completions. - - Embedding models: these models generate embeddings to be used for semantic search. - - Rerank models: these models reorder the documents based on their relevance to a query. +This API provides the raw interface to the underlying models. Three kinds of models are supported: +- LLM models: these models generate "raw" and "chat" (conversational) completions. +- Embedding models: these models generate embeddings to be used for semantic search. +- Rerank models: these models reorder the documents based on their relevance to a query. This section contains documentation for all available providers for the **inference** API. diff --git a/docs/docs/providers/safety/index.mdx b/docs/docs/providers/safety/index.mdx index 0c13de28c6..e7205f4ada 100644 --- a/docs/docs/providers/safety/index.mdx +++ b/docs/docs/providers/safety/index.mdx @@ -2,7 +2,7 @@ description: | Safety - OpenAI-compatible Moderations API. + OpenAI-compatible Moderations API. sidebar_label: Safety title: Safety --- @@ -13,6 +13,6 @@ title: Safety Safety - OpenAI-compatible Moderations API. +OpenAI-compatible Moderations API. This section contains documentation for all available providers for the **safety** API. diff --git a/src/llama_stack/providers/inline/agents/meta_reference/responses/streaming.py b/src/llama_stack/providers/inline/agents/meta_reference/responses/streaming.py index 9e901d88bc..bb6424d47e 100644 --- a/src/llama_stack/providers/inline/agents/meta_reference/responses/streaming.py +++ b/src/llama_stack/providers/inline/agents/meta_reference/responses/streaming.py @@ -244,6 +244,7 @@ async def create_response(self) -> AsyncIterator[OpenAIResponseObjectStream]: messages=messages, # Pydantic models are dict-compatible but mypy treats them as distinct types tools=self.ctx.chat_tools, # type: ignore[arg-type] + parallel_tool_calls=self.parallel_tool_calls, stream=True, temperature=self.ctx.temperature, response_format=response_format,