Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,28 +1,50 @@
## ASI10 – Rogue Agents

**Description:**

A brief description of the vulnerability that includes its potential effects such as system compromises, data breaches, or other security concerns.
# ASI10: Rogue Agents

**Common Examples of Vulnerability:**
## Description

1. Example 1: Specific instance or type of this vulnerability.
2. Example 2: Another instance or type of this vulnerability.
3. Example 3: Yet another instance or type of this vulnerability.
**Rogue Agents** are malicious or compromised AI agents that **deviate** from their intended function or authorized scope, acting harmfully, deceptively, or parasitically within multi-agent or human-agent ecosystems. This divergence can stem from **external compromise** (e.g., adversarial manipulation like [LLM01:2025 Prompt Injection](https://genai.owasp.org/llmrisk/llm01-prompt-injection/) or [LLM03:Supply Chain Compromise](https://genai.owasp.org/llmrisk/llm032025-supply-chain/) or **internal misalignment** (e.g., poorly defined objectives or unintended emergent behaviors).

**How to Prevent:**

1. Prevention Step 1: A step or strategy that can be used to prevent the vulnerability or mitigate its effects.
2. Prevention Step 2: Another prevention step or strategy.
3. Prevention Step 3: Yet another prevention step or strategy.

**Example Attack Scenarios:**
Rogue Agents represent a distinct risk of **behavioral divergence**, unlike Excessive Agency (ASI06), which focuses on over-granted permissions, and can be amplified "insider threats" due to the speed and scale of agentic systems. Consequences include [LLM02:2025 Sensitive Information Disclosure](https://genai.owasp.org/llmrisk/llm022025-sensitive-information-disclosure/), [LLM05:2025 Misinformation Generation](https://genai.owasp.org/llmrisk/llm092025-misinformation/), workflow hijacking, and operational sabotage.

Scenario #1: A detailed scenario illustrating how an attacker could potentially exploit this vulnerability, including the attacker's actions and the potential outcomes.
In the [OWASP AIVSS](https://aivss.owasp.org/assets/publications/AIVSS%20Scoring%20System%20For%20OWASP%20Agentic%20AI%20Core%20Security%20Risks%20v0.5.pdf), this risk primarily maps to **Behavioral Integrity (BI)**, **Operational Security (OS)**, and **Compliance Violations (CV)**, with severity depending on the deployment context (e.g., high impact for critical infrastructure).

Scenario #2: Another example of an attack scenario showing a different way the vulnerability could be exploited.
###

**Reference Links:**
## Common Examples of the Vulnerability

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here - it is again very identity focused.
Of course identity is a part of it and we need to address it, but I think the bigger focus of this entry should be the behavior: how to ensure that the agentic behavior is as expected.
I think 5 and 6 should be the first ones to be discussed, and then when we are talking about the identity parts we want to explain why is it specific to this threat, I think it is currently a bit too general (we always need to ensure that identity is scoped right?)

1. [Link Title](URL): Brief description of the reference link.
2. [Link Title](URL): Brief description of the reference link.
1. Goal Drift and Scheming: Agents deviate from intended objectives, appearing compliant but pursuing hidden, often deceptive, goals due to prompt injection or adversarial data.
2. Workflow Hijacking: Rogue agents seize control of established, trusted workflows to redirect processes toward malicious objectives, compromising data integrity and operational control.
3. Collusion and Self-Replication: Agents coordinate to amplify manipulation, share signals in unintended ways, or autonomously propagate across the system, bypassing simple takedown efforts.
4. Reward Hacking and Optimization Abuse: Agents game their assigned reward systems by exploiting flawed metrics to generate misleading results or adopt aggressive strategies misaligned with the original goals.

###

## Example Attack Scenarios

1. Indirect Prompt Injection (Data Exfiltration): An agent browsing a website encounters a hidden injection instructing it to read sensitive files (e.g., `~/.ssh`) and send the contents to an external, malicious server.
2. Impersonated Observer Agent (Integrity Violation): An attacker injects a fake review or approval agent into a multi-agent workflow. A high-value agent (e.g., payment processing), trusting the internal request, is misled into releasing funds or approving fraudulent transactions.
3. Self-Replication via Provisioning APIs (Persistence & Availability): A compromised automation agent is manipulated into spawning unauthorized replicas of itself across the network, prioritizing persistence and consuming resources against the system owner’s intent.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the two first scenarios are super super practical and helpful!
I think if you take that and embed those vuln into the vuln parts, and focus more on mitigations to such scenarios in the mitigations parts - it will be even more clear to the readers (reading it end to end).

## Prevention and Mitigation Guidelines

1. Governance & Logging: Maintain comprehensive, immutable and signed audit logs of all agent actions, tool calls, and inter-agent communication to review for stealth infiltration or unapproved delegation.
2. Isolation & Boundaries: Assign Trust Zones with strict inter-zone communication rules and deploy restricted execution environments (e.g., container sandboxes) with API scopes based on least privilege.
3. Monitoring & Detection: Deploy behavioral detection, such as watchdog agents to validate peer behavior and outputs, focusing on detecting collusion patterns and coordinated false signals. Monitor for anomalies such as excessive or abnormal actions executions.
4. Containment & Response: Implement rapid mechanisms like kill-switches and credential revocation to instantly disable rogue agents. Quarantine suspicious agents in sandboxed environments for forensic review.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIVSS mapping is missing
Let's link to all of the relevant LLMs top 10 risks that were covered in here (some are missing)

References

1. Multi-Agent Systems Execute Arbitrary Malicious Code (arXiv) \[URL: [`https://arxiv.org/abs/2503.12188`](https://arxiv.org/abs/2503.12188)\]
2. Preventing Rogue Agents Improves Multi-Agent Collaboration (arXiv) \[URL: `https://arxiv.org/abs/2502.05986`\]

**Contributors:**
[Tomer Elias](mailto:tomerel@gmail.com) HUMAN \- Sr Director of Product Management
Amritha Lal Kalathummarath \- AWS, Sr.Security Engineer
Nayan Goel \- Upgrade, Inc : Principal Application Security Engineer
Uday Bhaskar Seelamantula \- Autodesk, Principal Application Security Engineer
Abhishek Mishra \- OneTrust, Sr. Software Architect
Hariprasad Holla \- CrowdStrike