-
-
Notifications
You must be signed in to change notification settings - Fork 260
First Draft ASI10 Rogue Agents #723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,28 +1,50 @@ | ||
| ## ASI10 – Rogue Agents | ||
|
|
||
| **Description:** | ||
|
|
||
| A brief description of the vulnerability that includes its potential effects such as system compromises, data breaches, or other security concerns. | ||
| # ASI10: Rogue Agents | ||
|
|
||
| **Common Examples of Vulnerability:** | ||
| ## Description | ||
|
|
||
| 1. Example 1: Specific instance or type of this vulnerability. | ||
| 2. Example 2: Another instance or type of this vulnerability. | ||
| 3. Example 3: Yet another instance or type of this vulnerability. | ||
| **Rogue Agents** are malicious or compromised AI agents that **deviate** from their intended function or authorized scope, acting harmfully, deceptively, or parasitically within multi-agent or human-agent ecosystems. This divergence can stem from **external compromise** (e.g., adversarial manipulation like [LLM01:2025 Prompt Injection](https://genai.owasp.org/llmrisk/llm01-prompt-injection/) or [LLM03:Supply Chain Compromise](https://genai.owasp.org/llmrisk/llm032025-supply-chain/) or **internal misalignment** (e.g., poorly defined objectives or unintended emergent behaviors). | ||
|
|
||
| **How to Prevent:** | ||
|
|
||
| 1. Prevention Step 1: A step or strategy that can be used to prevent the vulnerability or mitigate its effects. | ||
| 2. Prevention Step 2: Another prevention step or strategy. | ||
| 3. Prevention Step 3: Yet another prevention step or strategy. | ||
|
|
||
| **Example Attack Scenarios:** | ||
| Rogue Agents represent a distinct risk of **behavioral divergence**, unlike Excessive Agency (ASI06), which focuses on over-granted permissions, and can be amplified "insider threats" due to the speed and scale of agentic systems. Consequences include [LLM02:2025 Sensitive Information Disclosure](https://genai.owasp.org/llmrisk/llm022025-sensitive-information-disclosure/), [LLM05:2025 Misinformation Generation](https://genai.owasp.org/llmrisk/llm092025-misinformation/), workflow hijacking, and operational sabotage. | ||
|
|
||
| Scenario #1: A detailed scenario illustrating how an attacker could potentially exploit this vulnerability, including the attacker's actions and the potential outcomes. | ||
| In the [OWASP AIVSS](https://aivss.owasp.org/assets/publications/AIVSS%20Scoring%20System%20For%20OWASP%20Agentic%20AI%20Core%20Security%20Risks%20v0.5.pdf), this risk primarily maps to **Behavioral Integrity (BI)**, **Operational Security (OS)**, and **Compliance Violations (CV)**, with severity depending on the deployment context (e.g., high impact for critical infrastructure). | ||
|
|
||
| Scenario #2: Another example of an attack scenario showing a different way the vulnerability could be exploited. | ||
| ### | ||
|
|
||
| **Reference Links:** | ||
| ## Common Examples of the Vulnerability | ||
|
|
||
| 1. [Link Title](URL): Brief description of the reference link. | ||
| 2. [Link Title](URL): Brief description of the reference link. | ||
| 1. Goal Drift and Scheming: Agents deviate from intended objectives, appearing compliant but pursuing hidden, often deceptive, goals due to prompt injection or adversarial data. | ||
| 2. Workflow Hijacking: Rogue agents seize control of established, trusted workflows to redirect processes toward malicious objectives, compromising data integrity and operational control. | ||
| 3. Collusion and Self-Replication: Agents coordinate to amplify manipulation, share signals in unintended ways, or autonomously propagate across the system, bypassing simple takedown efforts. | ||
| 4. Reward Hacking and Optimization Abuse: Agents game their assigned reward systems by exploiting flawed metrics to generate misleading results or adopt aggressive strategies misaligned with the original goals. | ||
|
|
||
| ### | ||
|
|
||
| ## Example Attack Scenarios | ||
|
|
||
| 1. Indirect Prompt Injection (Data Exfiltration): An agent browsing a website encounters a hidden injection instructing it to read sensitive files (e.g., `~/.ssh`) and send the contents to an external, malicious server. | ||
| 2. Impersonated Observer Agent (Integrity Violation): An attacker injects a fake review or approval agent into a multi-agent workflow. A high-value agent (e.g., payment processing), trusting the internal request, is misled into releasing funds or approving fraudulent transactions. | ||
| 3. Self-Replication via Provisioning APIs (Persistence & Availability): A compromised automation agent is manipulated into spawning unauthorized replicas of itself across the network, prioritizing persistence and consuming resources against the system owner’s intent. | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that the two first scenarios are super super practical and helpful! |
||
| ## Prevention and Mitigation Guidelines | ||
|
|
||
| 1. Governance & Logging: Maintain comprehensive, immutable and signed audit logs of all agent actions, tool calls, and inter-agent communication to review for stealth infiltration or unapproved delegation. | ||
| 2. Isolation & Boundaries: Assign Trust Zones with strict inter-zone communication rules and deploy restricted execution environments (e.g., container sandboxes) with API scopes based on least privilege. | ||
| 3. Monitoring & Detection: Deploy behavioral detection, such as watchdog agents to validate peer behavior and outputs, focusing on detecting collusion patterns and coordinated false signals. Monitor for anomalies such as excessive or abnormal actions executions. | ||
| 4. Containment & Response: Implement rapid mechanisms like kill-switches and credential revocation to instantly disable rogue agents. Quarantine suspicious agents in sandboxed environments for forensic review. | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AIVSS mapping is missing |
||
| References | ||
|
|
||
| 1. Multi-Agent Systems Execute Arbitrary Malicious Code (arXiv) \[URL: [`https://arxiv.org/abs/2503.12188`](https://arxiv.org/abs/2503.12188)\] | ||
| 2. Preventing Rogue Agents Improves Multi-Agent Collaboration (arXiv) \[URL: `https://arxiv.org/abs/2502.05986`\] | ||
|
|
||
| **Contributors:** | ||
| [Tomer Elias](mailto:tomerel@gmail.com) HUMAN \- Sr Director of Product Management | ||
| Amritha Lal Kalathummarath \- AWS, Sr.Security Engineer | ||
| Nayan Goel \- Upgrade, Inc : Principal Application Security Engineer | ||
| Uday Bhaskar Seelamantula \- Autodesk, Principal Application Security Engineer | ||
| Abhishek Mishra \- OneTrust, Sr. Software Architect | ||
| Hariprasad Holla \- CrowdStrike | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here - it is again very identity focused.
Of course identity is a part of it and we need to address it, but I think the bigger focus of this entry should be the behavior: how to ensure that the agentic behavior is as expected.
I think 5 and 6 should be the first ones to be discussed, and then when we are talking about the identity parts we want to explain why is it specific to this threat, I think it is currently a bit too general (we always need to ensure that identity is scoped right?)