Skip to main content
TopAIThreats home TOP AI THREATS
Technical Attack

Prompt Injection

An attack that inserts adversarial instructions into an AI model's input to override its intended behaviour, bypass safety constraints, or extract restricted information.

Definition

Prompt injection is an attack technique in which adversarial instructions are embedded within the input to a large language model or AI agent, causing the system to deviate from its intended behaviour. The attack exploits the fact that current LLM architectures cannot reliably distinguish between legitimate user instructions and injected adversarial content. Prompt injection can be direct, where the attacker crafts the input themselves, or indirect, where malicious instructions are embedded in external data sources that the model retrieves and processes during operation. Indirect prompt injection is particularly concerning in agentic systems that retrieve context from websites, documents, or databases, as the attacker need not interact with the model directly.

How It Relates to AI Threats

Prompt injection is a foundational vulnerability within the Security & Cyber domain, as it undermines the integrity of AI system behaviour at the input layer. Within the Agentic & Autonomous domain, indirect prompt injection poses escalating risks as AI agents gain access to external tools, APIs, and data sources. An injected instruction can cause an agent to exfiltrate data, execute unauthorised actions, or propagate compromised outputs to downstream systems. The attack is considered one of the most significant unresolved security challenges in LLM deployment.

Why It Occurs

  • LLM architectures process all input tokens in a shared context window without a reliable boundary between instructions and data
  • No current technique fully separates system-level directives from user-provided or retrieved content
  • Agentic systems that retrieve and process external content expand the attack surface to any data source the agent accesses
  • Safety alignment through fine-tuning and reinforcement learning provides probabilistic rather than deterministic protection against adversarial inputs
  • The rapid deployment of LLM-based applications has outpaced the development of robust input sanitisation frameworks

Real-World Context

Prompt injection has been demonstrated across all major commercial LLM deployments. Security researchers have shown that indirect prompt injection can compromise AI agents by embedding instructions in web pages, emails, or documents that agents process during retrieval-augmented generation. The OWASP Top 10 for LLM Applications lists prompt injection as the highest-priority vulnerability. The AI-orchestrated cyber espionage campaign documented in INC-25-0001 leveraged prompt-like manipulation techniques as part of its multi-stage autonomous attack chain. Organisations deploying LLM-based systems increasingly treat prompt injection as a core security concern requiring defence-in-depth strategies.

Last updated: 2026-02-14