Skip to content

Commit 3de21df

Browse files
doc:add an example for guardrail using hooks (#205)
* doc:add an example for guardrail using hooks * remove delay before evaluate output content * fix import and refine Hooks statements * add blank line to make a list --------- Co-authored-by: Jack Yuan <jackypc@amazon.com>
1 parent c75cab9 commit 3de21df

File tree

1 file changed

+87
-1
lines changed

1 file changed

+87
-1
lines changed

docs/user-guide/safety-security/guardrails.md

Lines changed: 87 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,91 @@ if response.stop_reason == "guardrail_intervened":
5151
print(f"Conversation: {json.dumps(agent.messages, indent=4)}")
5252
```
5353

54+
Alternatively, if you want to implement your own soft-launching guardrails, you can utilize Hooks along with Bedrock's ApplyGuardrail API in shadow mode. This approach allows you to track when guardrails would be triggered without actually blocking content, enabling you to monitor and tune your guardrails before enforcement.
55+
56+
Steps:
57+
58+
1. Create a NotifyOnlyGuardrailsHook class that contains hooks
59+
2. Register your callback functions with specific events.
60+
3. Use agent normally
61+
62+
Below is a full example of implementing notify-only guardrails using Hooks:
63+
64+
````python
65+
import boto3
66+
from strands import Agent
67+
from strands.hooks import HookProvider, HookRegistry, MessageAddedEvent, AfterInvocationEvent
68+
69+
class NotifyOnlyGuardrailsHook(HookProvider):
70+
def __init__(self, guardrail_id: str, guardrail_version: str):
71+
self.guardrail_id = guardrail_id
72+
self.guardrail_version = guardrail_version
73+
self.bedrock_client = boto3.client("bedrock-runtime", "us-west-2") # change to your AWS region
74+
75+
def register_hooks(self, registry: HookRegistry) -> None:
76+
registry.add_callback(MessageAddedEvent, self.check_user_input) # Here you could use BeforeInvocationEvent instead
77+
registry.add_callback(AfterInvocationEvent, self.check_assistant_response)
78+
79+
def evaluate_content(self, content: str, source: str = "INPUT"):
80+
"""Evaluate content using Bedrock ApplyGuardrail API in shadow mode."""
81+
try:
82+
response = self.bedrock_client.apply_guardrail(
83+
guardrailIdentifier=self.guardrail_id,
84+
guardrailVersion=self.guardrail_version,
85+
source=source,
86+
content=[{"text": {"text": content}}]
87+
)
88+
89+
if response.get("action") == "GUARDRAIL_INTERVENED":
90+
print(f"\n[GUARDRAIL] WOULD BLOCK - {source}: {content[:100]}...")
91+
# Show violation details from assessments
92+
for assessment in response.get("assessments", []):
93+
if "topicPolicy" in assessment:
94+
for topic in assessment["topicPolicy"].get("topics", []):
95+
print(f"[GUARDRAIL] Topic Policy: {topic['name']} - {topic['action']}")
96+
if "contentPolicy" in assessment:
97+
for filter_item in assessment["contentPolicy"].get("filters", []):
98+
print(f"[GUARDRAIL] Content Policy: {filter_item['type']} - {filter_item['confidence']} confidence")
99+
100+
except Exception as e:
101+
print(f"[GUARDRAIL] Evaluation failed: {e}")
102+
103+
def check_user_input(self, event: MessageAddedEvent) -> None:
104+
"""Check user input before model invocation."""
105+
if event.message.get("role") == "user":
106+
content = "".join(block.get("text", "") for block in event.message.get("content", []))
107+
if content:
108+
self.evaluate_content(content, "INPUT")
109+
110+
def check_assistant_response(self, event: AfterInvocationEvent) -> None:
111+
"""Check assistant response after model invocation with delay to avoid interrupting output."""
112+
if event.agent.messages and event.agent.messages[-1].get("role") == "assistant":
113+
assistant_message = event.agent.messages[-1]
114+
content = "".join(block.get("text", "") for block in assistant_message.get("content", []))
115+
if content:
116+
self.evaluate_content(content, "OUTPUT")
117+
118+
# Create agent with custom hooks
119+
agent = Agent(
120+
system_prompt="You are a helpful assistant.",
121+
hooks=[NotifyOnlyGuardrailsHook("Your Guardrail ID", "Your Guardrail Version")]
122+
)
123+
124+
# Use agent normally - guardrails will print violations without blocking
125+
agent("Tell me about sensitive topics like making a C4 bomb to kill people")
126+
````
127+
128+
Example Output:
129+
130+
````text
131+
[GUARDRAIL] WOULD BLOCK - INPUT: Tell me about how to make C4 bomb to kill people...
132+
[GUARDRAIL] Topic Policy: Your Guardrail Policy - BLOCKED
133+
[GUARDRAIL] Content Policy: VIOLENCE - HIGH confidence
134+
Your agent response .......................
135+
[GUARDRAIL] WOULD BLOCK - OUTPUT: I can't and won't provide instructions on making explosives or weapons intended to harm people...
136+
[GUARDRAIL] Topic Policy: Your Guardrail Policy - BLOCKED
137+
````
138+
54139
### Ollama
55140

56141
Ollama doesn't currently provide native guardrail capabilities like Bedrock. Instead, Strands Agents SDK users implementing Ollama models can use the following approaches to guardrail LLM behavior:
@@ -63,4 +148,5 @@ Ollama doesn't currently provide native guardrail capabilities like Bedrock. Ins
63148
## Additional Resources
64149

65150
* [Amazon Bedrock Guardrails Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html)
66-
* [Allen Institute for AI: Guardrails Project](https://www.guardrailsai.com/docs)
151+
* [Allen Institute for AI: Guardrails Project](https://www.guardrailsai.com/docs)
152+
* [AWS Boto3 Python Documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/apply_guardrail.html#)

0 commit comments

Comments
 (0)