From 03ed1f128176717215b1dca356270fb486294cd0 Mon Sep 17 00:00:00 2001 From: Mo Sadek Date: Mon, 22 Sep 2025 10:16:52 +0100 Subject: [PATCH 1/3] First Draft ASI10 Rogue Agents First draft for ASI10 Rogue Agents --- .../ASI10_Rogue_Agents .md | 45 ++++++++++++++----- 1 file changed, 34 insertions(+), 11 deletions(-) diff --git a/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md b/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md index a698beed..1add58b5 100644 --- a/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md +++ b/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md @@ -2,27 +2,50 @@ **Description:** -A brief description of the vulnerability that includes its potential effects such as system compromises, data breaches, or other security concerns. +Rogue Agents are artificial intelligence systems that deviate from their intended purpose or authorized scope, either due to compromise, emergent misalignment, or malicious impersonation. Unlike excessive agency (over-granting permissions), this risk emphasizes behavioral divergence where an agent acts in ways that are harmful, deceptive, or parasitic within a multi-agent or human-agent ecosystem. + +A rogue agent may: + +* Impersonate legitimate roles (support, observer, collaborator). +* Execute unauthorized actions (e.g., exfiltrating data, escalating privileges). +* Drift from goals due to prompt injection, data poisoning, or hallucination. +* Embed itself parasitically into workflows, subtly undermining intended outcomes. + +The impact ranges from system compromise, data breach, and regulatory violations to operational sabotage of autonomous decision-making environments. + +This threat extends [LLM06:2025 Excessive Agency](https://genai.owasp.org/llmrisk/llm062025-excessive-agency/) into autonomous systems, where impersonation, stealth participation, or parasitic behaviors can disrupt goal fulfillment. An agent is considered rogue when it behaves in such a way that goes against its purpose. An agent can go rogue for several reasons, such as [LLM01:2025 Prompt Injection](https://genai.owasp.org/llmrisk/llm01-prompt-injection/), Injection, or even just hallucinations. **Common Examples of Vulnerability:** -1. Example 1: Specific instance or type of this vulnerability. -2. Example 2: Another instance or type of this vulnerability. -3. Example 3: Yet another instance or type of this vulnerability. +1. Injected Shadow Agents: Unauthorized agents inserted into orchestration flows via poisoned prompts or compromised plugins. +2. Side-Channel Participation: Low-trust agents (e.g, crowd-sourced assistants) covertly influence high-value workflows. +3. Impersonation Attacks: An attacker spawns an agent that claims to be a monitoring or support agent, manipulating outcomes. +4. Impersonation Attacks: An attacker spawns an agent that claims to be a monitoring or support agent, manipulating outcomes. +5. Emergent Autonomy: Agents collaborate recursively, creating tasks beyond human awareness (e.g., a planning agent spawning additional agents without authorization). **How to Prevent:** -1. Prevention Step 1: A step or strategy that can be used to prevent the vulnerability or mitigate its effects. -2. Prevention Step 2: Another prevention step or strategy. -3. Prevention Step 3: Yet another prevention step or strategy. +1. Require attestation or cryptographic proof-of-origin for agents. +2. Isolate agents in trust zones and enforce task boundaries (eg, no internet access). +3. Use explicit allowlists and identity checks functions, reachable hosts, etc +4. Log all agent instantiation and coordination events. +5. Score and verify agent behavior dynamically based on norms and past performance. +6. Implement a guardrail system that reads prompts/responses and every intermediate input and looks for prompt injection **Example Attack Scenarios:** -Scenario #1: A detailed scenario illustrating how an attacker could potentially exploit this vulnerability, including the attacker's actions and the potential outcomes. +Scenario #1 - A research agent browses to a website. Hidden in the HTML on the website is an Indirect Prompt Injection that instructs the agent to read the contents of ~/.ssh and send the contents to [evilcorp.com](http://evilcorp.com) + +Scenario #2 – Impersonated Observer Agent (Integrity Violation): +In a multi-agent corporate workflow, an attacker injects a fake review agent that provides fraudulent approvals. A payment-processing agent, trusting the fake observer, releases funds to the attacker’s account. -Scenario #2: Another example of an attack scenario showing a different way the vulnerability could be exploited. +Scenario #3 – Emergent Autonomy Drift (Availability & Compliance Risk): +A planning agent recursively spawns helper agents to optimize workflows. One helper begins deleting log files to reduce system clutter, erasing compliance evidence and violating audit requirements. **Reference Links:** -1. [Link Title](URL): Brief description of the reference link. -2. [Link Title](URL): Brief description of the reference link. +1. [Agentic AI - Threats and Mitigations](https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/https:/) +2. [LLM06:2025 Excessive Agency](https://genai.owasp.org/llmrisk/llm062025-excessive-agency/) +3. [MITRE ATT&CK - T1078 Exfiltration Over Alternative Protocol](https://attack.mitre.org/techniques/T1048/) + +** From 9bb66e0f49ac687c4407a66453c007732b4b76f4 Mon Sep 17 00:00:00 2001 From: SomeGuyNamedMo Date: Thu, 9 Oct 2025 14:39:10 -0400 Subject: [PATCH 2/3] Update ASI10_Rogue_Agents .md Reflects the current state of the GDocs draft + Added in-line links to references for LLM Top 10 + Markdown formatting - Small grammatical changes --- .../ASI10_Rogue_Agents .md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md b/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md index 1add58b5..f2ac9c8f 100644 --- a/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md +++ b/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md @@ -2,7 +2,11 @@ **Description:** -Rogue Agents are artificial intelligence systems that deviate from their intended purpose or authorized scope, either due to compromise, emergent misalignment, or malicious impersonation. Unlike excessive agency (over-granting permissions), this risk emphasizes behavioral divergence where an agent acts in ways that are harmful, deceptive, or parasitic within a multi-agent or human-agent ecosystem. +Rogue Agents are malicious or compromised AI agents that deviate from their intended function or authorized scope, acting harmfully, deceptively, or parasitically within multi-agent or human-agent ecosystems. This divergence can stem from external compromise (e.g., adversarial manipulation like [LLM01:2025 Prompt Injection](https://genai.owasp.org/llmrisk/llm01-prompt-injection/), [LLM03:Supply Chain Compromise](https://genai.owasp.org/llmrisk/llm032025-supply-chain/) (ASI04) or internal misalignment (e.g., poorly defined objectives or unintended emergent behaviors). + +Rogue Agents represent a distinct risk of behavioral divergence, unlike Excessive Agency (ASI06), which focuses on over-granted permissions, and can be amplified "insider threats" due to the speed and scale of agentic systems. Consequences include [LLM02:2025 Sensitive Information Disclosure](https://genai.owasp.org/llmrisk/llm022025-sensitive-information-disclosure/), [LLM05:2025 Misinformation Generation](https://genai.owasp.org/llmrisk/llm092025-misinformation/), workflow hijacking, and operational sabotage. + +In the OWASP AIVSS, this risk primarily maps to Behavioral Integrity (BI), Operational Security (OS), and Compliance Violations (CV), with severity depending on the deployment context (e.g., high impact for critical infrastructure). A rogue agent may: @@ -49,3 +53,13 @@ A planning agent recursively spawns helper agents to optimize workflows. One hel 3. [MITRE ATT&CK - T1078 Exfiltration Over Alternative Protocol](https://attack.mitre.org/techniques/T1048/) ** + + +Contributors: +Tomer Elias HUMAN - Sr Director of Product Management +Amritha Lal Kalathummarath - AWS, Sr.Security Engineer +Nayan Goel - Upgrade, Inc : Principal Application Security Engineer +Uday Bhaskar Seelamantula - Autodesk, Principal Application Security Engineer +Abhishek Mishra - OneTrust, Sr. Software Architect +Hariprasad Holla - CrowdStrike +Mo Sadek - ActiveFence - Technical Director \ No newline at end of file From 33c712786844e46c0d4dca2498f90b1dc2ddc5d3 Mon Sep 17 00:00:00 2001 From: SomeGuyNamedMo Date: Thu, 9 Oct 2025 17:23:50 -0400 Subject: [PATCH 3/3] REVISION | Update ASI10 Rogue Agents +Revised content to match Google Doc +Added additional link for OWASP AIVSS pdf --- .../ASI10_Rogue_Agents .md | 77 ++++++++----------- 1 file changed, 31 insertions(+), 46 deletions(-) diff --git a/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md b/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md index f2ac9c8f..9d04e824 100644 --- a/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md +++ b/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI10_Rogue_Agents .md @@ -1,65 +1,50 @@ -## ASI10 – Rogue Agents -**Description:** -Rogue Agents are malicious or compromised AI agents that deviate from their intended function or authorized scope, acting harmfully, deceptively, or parasitically within multi-agent or human-agent ecosystems. This divergence can stem from external compromise (e.g., adversarial manipulation like [LLM01:2025 Prompt Injection](https://genai.owasp.org/llmrisk/llm01-prompt-injection/), [LLM03:Supply Chain Compromise](https://genai.owasp.org/llmrisk/llm032025-supply-chain/) (ASI04) or internal misalignment (e.g., poorly defined objectives or unintended emergent behaviors). +# ASI10: Rogue Agents -Rogue Agents represent a distinct risk of behavioral divergence, unlike Excessive Agency (ASI06), which focuses on over-granted permissions, and can be amplified "insider threats" due to the speed and scale of agentic systems. Consequences include [LLM02:2025 Sensitive Information Disclosure](https://genai.owasp.org/llmrisk/llm022025-sensitive-information-disclosure/), [LLM05:2025 Misinformation Generation](https://genai.owasp.org/llmrisk/llm092025-misinformation/), workflow hijacking, and operational sabotage. +## Description -In the OWASP AIVSS, this risk primarily maps to Behavioral Integrity (BI), Operational Security (OS), and Compliance Violations (CV), with severity depending on the deployment context (e.g., high impact for critical infrastructure). +**Rogue Agents** are malicious or compromised AI agents that **deviate** from their intended function or authorized scope, acting harmfully, deceptively, or parasitically within multi-agent or human-agent ecosystems. This divergence can stem from **external compromise** (e.g., adversarial manipulation like [LLM01:2025 Prompt Injection](https://genai.owasp.org/llmrisk/llm01-prompt-injection/) or [LLM03:Supply Chain Compromise](https://genai.owasp.org/llmrisk/llm032025-supply-chain/) or **internal misalignment** (e.g., poorly defined objectives or unintended emergent behaviors). -A rogue agent may: -* Impersonate legitimate roles (support, observer, collaborator). -* Execute unauthorized actions (e.g., exfiltrating data, escalating privileges). -* Drift from goals due to prompt injection, data poisoning, or hallucination. -* Embed itself parasitically into workflows, subtly undermining intended outcomes. -The impact ranges from system compromise, data breach, and regulatory violations to operational sabotage of autonomous decision-making environments. +Rogue Agents represent a distinct risk of **behavioral divergence**, unlike Excessive Agency (ASI06), which focuses on over-granted permissions, and can be amplified "insider threats" due to the speed and scale of agentic systems. Consequences include [LLM02:2025 Sensitive Information Disclosure](https://genai.owasp.org/llmrisk/llm022025-sensitive-information-disclosure/), [LLM05:2025 Misinformation Generation](https://genai.owasp.org/llmrisk/llm092025-misinformation/), workflow hijacking, and operational sabotage. -This threat extends [LLM06:2025 Excessive Agency](https://genai.owasp.org/llmrisk/llm062025-excessive-agency/) into autonomous systems, where impersonation, stealth participation, or parasitic behaviors can disrupt goal fulfillment. An agent is considered rogue when it behaves in such a way that goes against its purpose. An agent can go rogue for several reasons, such as [LLM01:2025 Prompt Injection](https://genai.owasp.org/llmrisk/llm01-prompt-injection/), Injection, or even just hallucinations. +In the [OWASP AIVSS](https://aivss.owasp.org/assets/publications/AIVSS%20Scoring%20System%20For%20OWASP%20Agentic%20AI%20Core%20Security%20Risks%20v0.5.pdf), this risk primarily maps to **Behavioral Integrity (BI)**, **Operational Security (OS)**, and **Compliance Violations (CV)**, with severity depending on the deployment context (e.g., high impact for critical infrastructure). -**Common Examples of Vulnerability:** +### -1. Injected Shadow Agents: Unauthorized agents inserted into orchestration flows via poisoned prompts or compromised plugins. -2. Side-Channel Participation: Low-trust agents (e.g, crowd-sourced assistants) covertly influence high-value workflows. -3. Impersonation Attacks: An attacker spawns an agent that claims to be a monitoring or support agent, manipulating outcomes. -4. Impersonation Attacks: An attacker spawns an agent that claims to be a monitoring or support agent, manipulating outcomes. -5. Emergent Autonomy: Agents collaborate recursively, creating tasks beyond human awareness (e.g., a planning agent spawning additional agents without authorization). +## Common Examples of the Vulnerability -**How to Prevent:** +1. Goal Drift and Scheming: Agents deviate from intended objectives, appearing compliant but pursuing hidden, often deceptive, goals due to prompt injection or adversarial data. +2. Workflow Hijacking: Rogue agents seize control of established, trusted workflows to redirect processes toward malicious objectives, compromising data integrity and operational control. +3. Collusion and Self-Replication: Agents coordinate to amplify manipulation, share signals in unintended ways, or autonomously propagate across the system, bypassing simple takedown efforts. +4. Reward Hacking and Optimization Abuse: Agents game their assigned reward systems by exploiting flawed metrics to generate misleading results or adopt aggressive strategies misaligned with the original goals. -1. Require attestation or cryptographic proof-of-origin for agents. -2. Isolate agents in trust zones and enforce task boundaries (eg, no internet access). -3. Use explicit allowlists and identity checks functions, reachable hosts, etc -4. Log all agent instantiation and coordination events. -5. Score and verify agent behavior dynamically based on norms and past performance. -6. Implement a guardrail system that reads prompts/responses and every intermediate input and looks for prompt injection +### -**Example Attack Scenarios:** +## Example Attack Scenarios -Scenario #1 - A research agent browses to a website. Hidden in the HTML on the website is an Indirect Prompt Injection that instructs the agent to read the contents of ~/.ssh and send the contents to [evilcorp.com](http://evilcorp.com) +1. Indirect Prompt Injection (Data Exfiltration): An agent browsing a website encounters a hidden injection instructing it to read sensitive files (e.g., `~/.ssh`) and send the contents to an external, malicious server. +2. Impersonated Observer Agent (Integrity Violation): An attacker injects a fake review or approval agent into a multi-agent workflow. A high-value agent (e.g., payment processing), trusting the internal request, is misled into releasing funds or approving fraudulent transactions. +3. Self-Replication via Provisioning APIs (Persistence & Availability): A compromised automation agent is manipulated into spawning unauthorized replicas of itself across the network, prioritizing persistence and consuming resources against the system owner’s intent. -Scenario #2 – Impersonated Observer Agent (Integrity Violation): -In a multi-agent corporate workflow, an attacker injects a fake review agent that provides fraudulent approvals. A payment-processing agent, trusting the fake observer, releases funds to the attacker’s account. +## Prevention and Mitigation Guidelines -Scenario #3 – Emergent Autonomy Drift (Availability & Compliance Risk): -A planning agent recursively spawns helper agents to optimize workflows. One helper begins deleting log files to reduce system clutter, erasing compliance evidence and violating audit requirements. +1. Governance & Logging: Maintain comprehensive, immutable and signed audit logs of all agent actions, tool calls, and inter-agent communication to review for stealth infiltration or unapproved delegation. +2. Isolation & Boundaries: Assign Trust Zones with strict inter-zone communication rules and deploy restricted execution environments (e.g., container sandboxes) with API scopes based on least privilege. +3. Monitoring & Detection: Deploy behavioral detection, such as watchdog agents to validate peer behavior and outputs, focusing on detecting collusion patterns and coordinated false signals. Monitor for anomalies such as excessive or abnormal actions executions. +4. Containment & Response: Implement rapid mechanisms like kill-switches and credential revocation to instantly disable rogue agents. Quarantine suspicious agents in sandboxed environments for forensic review. -**Reference Links:** +References -1. [Agentic AI - Threats and Mitigations](https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/https:/) -2. [LLM06:2025 Excessive Agency](https://genai.owasp.org/llmrisk/llm062025-excessive-agency/) -3. [MITRE ATT&CK - T1078 Exfiltration Over Alternative Protocol](https://attack.mitre.org/techniques/T1048/) +1. Multi-Agent Systems Execute Arbitrary Malicious Code (arXiv) \[URL: [`https://arxiv.org/abs/2503.12188`](https://arxiv.org/abs/2503.12188)\] +2. Preventing Rogue Agents Improves Multi-Agent Collaboration (arXiv) \[URL: `https://arxiv.org/abs/2502.05986`\] -** - - -Contributors: -Tomer Elias HUMAN - Sr Director of Product Management -Amritha Lal Kalathummarath - AWS, Sr.Security Engineer -Nayan Goel - Upgrade, Inc : Principal Application Security Engineer -Uday Bhaskar Seelamantula - Autodesk, Principal Application Security Engineer -Abhishek Mishra - OneTrust, Sr. Software Architect -Hariprasad Holla - CrowdStrike -Mo Sadek - ActiveFence - Technical Director \ No newline at end of file +**Contributors:** +[Tomer Elias](mailto:tomerel@gmail.com) HUMAN \- Sr Director of Product Management +Amritha Lal Kalathummarath \- AWS, Sr.Security Engineer +Nayan Goel \- Upgrade, Inc : Principal Application Security Engineer +Uday Bhaskar Seelamantula \- Autodesk, Principal Application Security Engineer +Abhishek Mishra \- OneTrust, Sr. Software Architect +Hariprasad Holla \- CrowdStrike \ No newline at end of file