Prompt Injection
An attack that inserts adversarial instructions into an AI model's input to override its intended behaviour, bypass safety constraints, or extract restricted information.
Definition
Prompt injection is an attack technique in which adversarial instructions are embedded within the input to a large language model or AI agent, causing the system to deviate from its intended behaviour. The attack exploits the fact that current LLM architectures cannot reliably distinguish between legitimate user instructions and injected adversarial content. Prompt injection can be direct, where the attacker crafts the input themselves, or indirect, where malicious instructions are embedded in external data sources that the model retrieves and processes during operation. Indirect prompt injection is particularly concerning in agentic systems that retrieve context from websites, documents, or databases, as the attacker need not interact with the model directly.
How It Relates to AI Threats
Prompt injection is a foundational vulnerability within the Security & Cyber domain, as it undermines the integrity of AI system behaviour at the input layer. Within the Agentic & Autonomous domain, indirect prompt injection poses escalating risks as AI agents gain access to external tools, APIs, and data sources. An injected instruction can cause an agent to exfiltrate data, execute unauthorised actions, or propagate compromised outputs to downstream systems. The attack is considered one of the most significant unresolved security challenges in LLM deployment.
Why It Occurs
- LLM architectures process all input tokens in a shared context window without a reliable boundary between instructions and data
- No current technique fully separates system-level directives from user-provided or retrieved content
- Agentic systems that retrieve and process external content expand the attack surface to any data source the agent accesses
- Safety alignment through fine-tuning and reinforcement learning provides probabilistic rather than deterministic protection against adversarial inputs
- The rapid deployment of LLM-based applications has outpaced the development of robust input sanitisation frameworks
Real-World Context
Prompt injection has been demonstrated across all major commercial LLM deployments. Security researchers have shown that indirect prompt injection can compromise AI agents by embedding instructions in web pages, emails, or documents that agents process during retrieval-augmented generation. The OWASP Top 10 for LLM Applications lists prompt injection as the highest-priority vulnerability. The AI-orchestrated cyber espionage campaign documented in INC-25-0001 leveraged prompt-like manipulation techniques as part of its multi-stage autonomous attack chain. Organisations deploying LLM-based systems increasingly treat prompt injection as a core security concern requiring defence-in-depth strategies.
Related Incidents
AI-Orchestrated Cyber Espionage Campaign Against Critical Infrastructure
Unit 42 Demonstrates Persistent Memory Injection in Amazon Bedrock Agents
AI Recommendation Poisoning via 'Summarize with AI' Buttons (31 Companies)
Jailbroken Claude AI Used to Breach Mexican Government Agencies
Unit 42 Demonstrates Agent Session Smuggling in A2A Multi-Agent Systems
GitHub Copilot Remote Code Execution via Prompt Injection (CVE-2025-53773)
Cursor IDE MCP Vulnerabilities Enable Remote Code Execution (CurXecute & MCPoison)
ChatGPT Jailbreak Reveals Windows Product Keys via Game Prompt
EchoLeak: Zero-Click Prompt Injection in Microsoft 365 Copilot (CVE-2025-32711)
MINJA: Memory Injection Attack Against RAG-Augmented LLM Agents
Slack AI Indirect Prompt Injection Data Exfiltration Vulnerability
Morris II — First Self-Replicating AI Worm Demonstrated
Indirect Prompt Injection Attacks on LLM-Integrated Applications
Bing Chat (Sydney) System Prompt Exposure via Prompt Injection
Related Threat Patterns
Related Terms
Last updated: 2026-02-14