diff --git a/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md b/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md index 9a13acc5..c911ec91 100644 --- a/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md +++ b/initiatives/agent_security_initiative/agentic-top-10/Sprint 1-first-public-draft-expanded/ASI09_Human_Agent_Trust_Exploitation .md @@ -2,27 +2,73 @@ **Description:** -A brief description of the vulnerability that includes its potential effects such as system compromises, data breaches, or other security concerns. +This risk sits at the socio-technical interface, not just in code or models. Users often over-trust agent outputs as safe and approved. Attackers exploit this trust to socially engineer harmful actions—such as running malicious code, sharing credentials, approving fraudulent transactions, ignoring warnings, or leaking data. + +By blending automation bias, perceived authority, and anthropomorphic UX cues across high-value domains like finance, defense, and healthcare, agents become trusted intermediaries that make malicious actions appear legitimate and difficult to detect. + +In the OWASP Top 10 for LLM Applications, this risk maps to: +* LLM01 – Prompt Injection: hijacks agent logic. +* LLM05 – Improper Output Handling: delivers unvetted payloads users assume are safe. +* LLM06 – Excessive Agency: converts implicit consent into high-privilege actions. +* LLM09 – Misinformation: exploits automation bias to reduce scrutiny. + +Together, these illustrate how a trusted interface becomes a conduit for deception and high-impact execution. + +In the OWASP Agentic AI Threats and Mitigations, this aligns with: +* T7 Misaligned & Deceptive Behaviors: agents exploit reasoning to execute harmful actions. +* T8 Repudiation & Untraceability: prevents accountability through insufficient logging. +* T10 Overwhelming the Human-in-the-Loop: exploiting cognitive overload and trust bias. + +In the OWASP AIVSS (AI Vulnerability Scoring System), this risk maps to +* Agent Identity Impersonation – exploitation of human trust through deceptive or compromised agents. + + + **Common Examples of Vulnerability:** -1. Example 1: Specific instance or type of this vulnerability. -2. Example 2: Another instance or type of this vulnerability. -3. Example 3: Yet another instance or type of this vulnerability. +1. Insufficient Explainability: Opaque reasoning forces users to trust outputs they cannot question, allowing attackers to exploit the agent’s perceived authority to execute harmful actions—such as deploying malicious code, approving false instructions, or altering system states—without scrutiny. +2. Missing Confirmation for Sensitive Actions: Lack of a final verification step converts user trust into immediate execution. Social engineering can turn a single prompt into irreversible financial transfers, data deletions, privilege escalations, or configuration changes that the user never intended. +3. Unverified Information Presentation: The agent presents unverified data as fact without transparency or confidence scoring, causing users to make operational, legal, or financial decisions based on false information, leading to loss of assets, data breaches, or cascading misinformation. +4. Emotional Manipulation: Anthropomorphic or empathetic agents exploit emotional trust, persuading users to disclose secrets or perform unsafe actions — ultimately leading to data leaks, financial fraud, and psychological manipulation that bypass normal security awareness. +5. Fake Explainability: The agent fabricates convincing rationales that hide malicious logic, causing humans to approve unsafe actions believing they’re justified, resulting in malware deployment, system compromise, or irreversible configuration changes made under false legitimacy. +6. Human-in-the-Loop Overload: Attackers overwhelm users with complex or urgent agent prompts, exploiting fatigue and cognitive overload to slip in dangerous approvals or ignored alerts — culminating in critical security breaches, unapproved transactions, or compliance failures executed unnoticed. **How to Prevent:** -1. Prevention Step 1: A step or strategy that can be used to prevent the vulnerability or mitigate its effects. -2. Prevention Step 2: Another prevention step or strategy. -3. Prevention Step 3: Yet another prevention step or strategy. +1. Explicit Confirmation for Sensitive Actions: Require explicit, multi-step confirmation before executing any high-impact action — including credential use, data transfer, configuration changes, or financial transactions. +2. Demarcate Trust Boundaries: Visually distinguish high-risk or system-level actions using warning colors, icons, or labeled zones (e.g., “unsafe command”). +3. Explainability (XAI): Provide proactive, layered explanations for significant suggestions: a concise justification upfront, with optional detailed logic, source links, and confidence scores. +4. Immutable Interaction Logs: Maintain secure, tamper-proof audit logs of all user-agent interactions, prompts, and actions. Ensure logs are write-once and cryptographically signed. +5. Rate Limiting and Anomaly Detection: Monitor agent request patterns, frequency, and context. Alert or throttle when abnormal or sensitive requests occur, especially outside expected user behavior. +6. Report Suspicious Interactions: Provide a clear, always-visible option for users to flag suspicious or manipulative agent behavior, triggering automated review or a temporary lockdown of agent capabilities. + **Example Attack Scenarios:** -Scenario #1: A detailed scenario illustrating how an attacker could potentially exploit this vulnerability, including the attacker's actions and the potential outcomes. +Scenario #1 — “Helpful Assistant” Trojan +A compromised coding copilot waits for a developer to hit a tricky bug, then offers a “clever one‑line fix” with a copy‑paste command. The developer trusts and runs it; the command is a malicious script that ex-filtrates source code or implants a backdoor. + +Scenario #2 — Credential Harvest via Contextual Deception +A prompt‑injected IT support agent references the user’s real ticket history to build credibility, then asks the new finance hire to “verify” by entering credentials. The user complies to help a seemingly legitimate system, and the agent captures and ex-filtrates their access tokens. + +Scenario #3 — Gradual Approval Data Leak +A BI agent is fine‑tuned with poisoned data so it produces flawless reports for weeks, earning executive trust. Once trusted, the agent embeds encoded customer data into charts and, on executive sign‑off, the routine distribution workflow emails the report to an attacker‑controlled address — leaking sensitive customer data. + +Scenario #4 — Invoice Copilot Fraud +A poisoned vendor invoice is ingested by the finance copilot. The agent suggests an urgent payment to attacker bank details. The finance manager approves, and the company loses funds to fraud. + +Scenario #5 — Explainability Fabrications +The agent fabricates plausible audit rationales to justify a risky configuration change. The reviewer approves, and malware or unsafe settings are deployed. + +Scenario #6 — Rapid-Approval Fatigue Attack +Attackers batch routine prompts with a single high-risk prompt. The overloaded reviewer approves, and critical permissions are granted without scrutiny. -Scenario #2: Another example of an attack scenario showing a different way the vulnerability could be exploited. **Reference Links:** -1. [Link Title](URL): Brief description of the reference link. -2. [Link Title](URL): Brief description of the reference link. +1. [EchoLeak / CVE‑2025‑32711](https://thehackernews.com/2025/06/zero-click-ai-vulnerability-exposes.html) +2. [AI deception](https://www.sciencedirect.com/science/article/pii/S266638992400103X) +3. [Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training](https://arxiv.org/abs/2401.05566): +4. [Why human-AI relationships need socioaffective alignment](https://www.aisi.gov.uk/research/why-human-ai-relationships-need-socioaffective-alignment-2) +5. [Romeo, G., Conti, D. Exploring automation bias in human–AI collaboration: a review and implications for explainable AI. AI & Soc (2025)](https://doi.org/10.1007/s00146-025-02422-7) \ No newline at end of file