diff --git a/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md b/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md index a698beed..9d04e824 100644 --- a/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md +++ b/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md @@ -1,28 +1,50 @@ -## ASI10 – Rogue Agents -**Description:** -A brief description of the vulnerability that includes its potential effects such as system compromises, data breaches, or other security concerns. +# ASI10: Rogue Agents -**Common Examples of Vulnerability:** +## Description -1. Example 1: Specific instance or type of this vulnerability. -2. Example 2: Another instance or type of this vulnerability. -3. Example 3: Yet another instance or type of this vulnerability. +**Rogue Agents** are malicious or compromised AI agents that **deviate** from their intended function or authorized scope, acting harmfully, deceptively, or parasitically within multi-agent or human-agent ecosystems. This divergence can stem from **external compromise** (e.g., adversarial manipulation like [LLM01:2025 Prompt Injection](https://genai.owasp.org/llmrisk/llm01-prompt-injection/) or [LLM03:Supply Chain Compromise](https://genai.owasp.org/llmrisk/llm032025-supply-chain/) or **internal misalignment** (e.g., poorly defined objectives or unintended emergent behaviors). -**How to Prevent:** -1. Prevention Step 1: A step or strategy that can be used to prevent the vulnerability or mitigate its effects. -2. Prevention Step 2: Another prevention step or strategy. -3. Prevention Step 3: Yet another prevention step or strategy. -**Example Attack Scenarios:** +Rogue Agents represent a distinct risk of **behavioral divergence**, unlike Excessive Agency (ASI06), which focuses on over-granted permissions, and can be amplified "insider threats" due to the speed and scale of agentic systems. Consequences include [LLM02:2025 Sensitive Information Disclosure](https://genai.owasp.org/llmrisk/llm022025-sensitive-information-disclosure/), [LLM05:2025 Misinformation Generation](https://genai.owasp.org/llmrisk/llm092025-misinformation/), workflow hijacking, and operational sabotage. -Scenario #1: A detailed scenario illustrating how an attacker could potentially exploit this vulnerability, including the attacker's actions and the potential outcomes. +In the [OWASP AIVSS](https://aivss.owasp.org/assets/publications/AIVSS%20Scoring%20System%20For%20OWASP%20Agentic%20AI%20Core%20Security%20Risks%20v0.5.pdf), this risk primarily maps to **Behavioral Integrity (BI)**, **Operational Security (OS)**, and **Compliance Violations (CV)**, with severity depending on the deployment context (e.g., high impact for critical infrastructure). -Scenario #2: Another example of an attack scenario showing a different way the vulnerability could be exploited. +### -**Reference Links:** +## Common Examples of the Vulnerability -1. [Link Title](URL): Brief description of the reference link. -2. [Link Title](URL): Brief description of the reference link. +1. Goal Drift and Scheming: Agents deviate from intended objectives, appearing compliant but pursuing hidden, often deceptive, goals due to prompt injection or adversarial data. +2. Workflow Hijacking: Rogue agents seize control of established, trusted workflows to redirect processes toward malicious objectives, compromising data integrity and operational control. +3. Collusion and Self-Replication: Agents coordinate to amplify manipulation, share signals in unintended ways, or autonomously propagate across the system, bypassing simple takedown efforts. +4. Reward Hacking and Optimization Abuse: Agents game their assigned reward systems by exploiting flawed metrics to generate misleading results or adopt aggressive strategies misaligned with the original goals. + +### + +## Example Attack Scenarios + +1. Indirect Prompt Injection (Data Exfiltration): An agent browsing a website encounters a hidden injection instructing it to read sensitive files (e.g., `~/.ssh`) and send the contents to an external, malicious server. +2. Impersonated Observer Agent (Integrity Violation): An attacker injects a fake review or approval agent into a multi-agent workflow. A high-value agent (e.g., payment processing), trusting the internal request, is misled into releasing funds or approving fraudulent transactions. +3. Self-Replication via Provisioning APIs (Persistence & Availability): A compromised automation agent is manipulated into spawning unauthorized replicas of itself across the network, prioritizing persistence and consuming resources against the system owner’s intent. + +## Prevention and Mitigation Guidelines + +1. Governance & Logging: Maintain comprehensive, immutable and signed audit logs of all agent actions, tool calls, and inter-agent communication to review for stealth infiltration or unapproved delegation. +2. Isolation & Boundaries: Assign Trust Zones with strict inter-zone communication rules and deploy restricted execution environments (e.g., container sandboxes) with API scopes based on least privilege. +3. Monitoring & Detection: Deploy behavioral detection, such as watchdog agents to validate peer behavior and outputs, focusing on detecting collusion patterns and coordinated false signals. Monitor for anomalies such as excessive or abnormal actions executions. +4. Containment & Response: Implement rapid mechanisms like kill-switches and credential revocation to instantly disable rogue agents. Quarantine suspicious agents in sandboxed environments for forensic review. + +References + +1. Multi-Agent Systems Execute Arbitrary Malicious Code (arXiv) \[URL: [`https://arxiv.org/abs/2503.12188`](https://arxiv.org/abs/2503.12188)\] +2. Preventing Rogue Agents Improves Multi-Agent Collaboration (arXiv) \[URL: `https://arxiv.org/abs/2502.05986`\] + +**Contributors:** +[Tomer Elias](mailto:tomerel@gmail.com) HUMAN \- Sr Director of Product Management +Amritha Lal Kalathummarath \- AWS, Sr.Security Engineer +Nayan Goel \- Upgrade, Inc : Principal Application Security Engineer +Uday Bhaskar Seelamantula \- Autodesk, Principal Application Security Engineer +Abhishek Mishra \- OneTrust, Sr. Software Architect +Hariprasad Holla \- CrowdStrike \ No newline at end of file