Context Window
The maximum amount of text (measured in tokens) that a large language model can process in a single interaction, encompassing all input (system prompt, conversation history, retrieved documents, tool outputs) and generated output. The context window defines the boundary of what the model can perceive and reason about at any given time.
Definition
The context window is the fixed-size input buffer of a large language model, measured in tokens (subword units averaging approximately 0.75 words in English). All information the model processes — system instructions, user messages, conversation history, retrieved documents, tool call results, and the model’s own generated output — must fit within this window. Context windows have grown from 4,096 tokens (early GPT-3.5) to 128,000 tokens (GPT-4.1), 200,000 tokens (Claude), and 1,000,000+ tokens (Gemini). Despite these increases, context windows remain finite, creating practical constraints on how much information an AI system can reason about simultaneously and necessitating strategies such as retrieval-augmented generation and memory management for tasks that exceed the window.
How It Relates to AI Threats
The context window has security and reliability implications across the Agentic and Autonomous Threats and Information Integrity Threats domains. In agentic systems, long-running tasks that exceed the context window require memory summarisation, which can lose critical safety-relevant context. Adversarial inputs can be designed to consume context window space, displacing important instructions (a form of prompt injection). In RAG systems, the context window limits how much retrieved content the model can consider, creating risks of important information being truncated or overshadowed by adversarial content placed earlier in the context. Additionally, model performance on information in the middle of long contexts is empirically weaker than at the beginning or end (the “lost in the middle” phenomenon).
Why It Occurs
- The transformer architecture underlying LLMs has a fixed maximum sequence length determined at training time
- Computational cost of attention mechanisms scales quadratically with context length, creating economic constraints
- Longer contexts require more memory and processing power, affecting inference speed and cost
- Despite architectural advances, models still show degraded performance on very long contexts compared to short ones
- The tension between expanding context windows (more information available) and maintaining reasoning quality remains unresolved
Real-World Context
Context window limitations have practical consequences in production AI systems. Legal document analysis, codebase understanding, and research synthesis tasks frequently exceed context window boundaries, requiring chunking and summarisation strategies that can lose important information. The “lost in the middle” phenomenon documented by Liu et al. (2023) showed that LLMs perform worse on information placed in the middle of long contexts compared to the beginning or end, affecting RAG system design. Competition among AI providers has driven rapid context window expansion, with implications for both capability and security as models process increasingly large volumes of potentially untrusted content.
Related Threat Patterns
Related Terms
Last updated: 2026-04-03