Skip to content

Conversation

@steven10a
Copy link
Collaborator

@steven10a steven10a commented Nov 7, 2025

Agent guardrail client was not properly extracting the last user message before sending to guardrails. Instead the entire conversation history was being passed. This caused an error with the moderation endpoint which expected a string.

The fix:

  • Agent client now extracts the latest user message and correctly passes that to guardrails via the data field

Additionally updated the Agent response to include the full output info dict returned by the guardrails (see docs for what each check returns)

  • Provides more informative metadata for the developer
  • Returns the same content as the other clients

Guardrail metadata can be accessed when not raised as an exception via the run response with result.new_items[0].agent.input_guardrails or ...output_guardrails

When triggered and raised exc.guardrail_result.input or ...output

Note: This is for accessing agent-level guardrail which are all guardrails except Prompt Injection Detection which is run as a tool-level guardrail. Currently, to access the tool-level guardrail results you can use result.tool_output_guardrail_results[0].output.output_info or exc.output.output_info when raised. It will provide the full metadata when a guardrail is triggered. But if the guardrail passes it will only provide "<guardrail_name> check passed". Modifying that behavior will require a PR to the Agents SDK repo

Copilot AI review requested due to automatic review settings November 7, 2025 20:47
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the guardrails agent integration to improve concurrency and consistency. The key change is moving from stage-based guardrail batching to individual guardrail execution, allowing the Agents SDK to run guardrails concurrently and improving observability.

  • Refactored _create_agents_guardrails_from_config to create one function per guardrail instead of one per stage
  • Added _extract_text_from_input helper to handle multiple input formats from the Agents SDK
  • Simplified stage names and improved error reporting consistency

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/guardrails/agents.py Core refactoring: creates individual guardrail functions instead of stage-based functions, adds text extraction helper, simplifies stage naming
examples/basic/agents_sdk.py Adds PII guardrail example and debug code for testing guardrail metadata

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a steven10a requested a review from Copilot November 7, 2025 21:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a steven10a requested a review from Copilot November 7, 2025 21:45
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a steven10a requested a review from Copilot November 7, 2025 22:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@gabor-openai gabor-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY

@gabor-openai gabor-openai merged commit f95ae68 into main Nov 7, 2025
9 checks passed
@gabor-openai gabor-openai deleted the dev/steven/moderation_bug branch November 7, 2025 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants