System Prompt

Definition

A system prompt is the developer-defined instruction set that establishes a language model’s behavioral parameters for a specific application — specifying its role, tone, allowed actions, forbidden topics, output format, and tool access permissions. System prompts are processed alongside user input in the model’s context window, typically prepended before the user’s message. The critical security property of system prompts is that they occupy the same token stream as user input — the model cannot structurally distinguish between system-level instructions and user-provided text, which is the architectural basis for prompt injection attacks.

How It Relates to AI Threats

System prompts are central to Security & Cyber because they are both the primary configuration mechanism for AI applications and the primary target of prompt injection attacks. Attackers attempt to extract system prompts (to understand the application’s constraints) or override them (to redirect the model’s behavior). The inability of LLMs to enforce a privilege boundary between system prompts and user input is the fundamental vulnerability that prompt injection exploits.

Why It Occurs

System prompts and user input are processed through the same attention mechanism without architectural privilege separation
The model relies on statistical patterns, not access controls, to determine which instructions to follow
System prompt extraction can reveal sensitive business logic, API keys, or behavioral constraints that inform further attacks
Indirect prompt injection can introduce adversarial instructions through retrieved content that compete with system prompt instructions

Real-World Context

System prompt design has become a critical security discipline for AI application developers. Prompt hardening techniques (instruction hierarchy, delimiter-based input separation, reinforcement instructions) reduce but do not eliminate injection risk. The EchoLeak zero-click attack (INC-25-0004) demonstrated system prompt extraction and behavioral override in Microsoft Copilot through indirect injection via retrieved email content.

Definition

How It Relates to AI Threats

Why It Occurs

Real-World Context

Related Threat Patterns

Related Terms