Memory Poisoning
The deliberate corruption of an AI agent's persistent memory, context window, or stored state to manipulate its future decisions, outputs, or behavior without the agent or its operators detecting the alteration.
Definition
Memory poisoning is an attack targeting AI agents that maintain persistent state across interactions — including conversation histories, retrieved context, stored preferences, and accumulated knowledge. By injecting false, misleading, or adversarial content into these memory stores, an attacker can influence the agent’s reasoning and actions over extended periods. Unlike data poisoning, which targets the model’s training phase, memory poisoning operates at inference time, corrupting the dynamic context that shapes an agent’s ongoing behavior. The attack is especially effective against retrieval-augmented generation (RAG) systems, where poisoned documents in a knowledge base can persistently alter agent outputs without modifying the underlying model weights.
How It Relates to AI Threats
Memory poisoning is a threat pattern within the Agentic and Autonomous AI Threats domain. As AI agents are deployed with persistent memory capabilities — storing user preferences, task histories, and accumulated context — the integrity of these memory stores becomes a critical security concern. A poisoned memory can cause an agent to provide consistently wrong information, take unauthorized actions, or subtly shift its behavior over time in ways that are difficult to detect. The threat is amplified in multi-agent systems where poisoned context can propagate from one agent to another, and in long-running agent deployments where corrupted memories compound over many interaction cycles.
Why It Occurs
- AI agents increasingly rely on persistent memory and retrieval-augmented generation, creating new attack surfaces beyond the model itself
- Memory stores typically lack the integrity verification and access controls applied to traditional databases
- Indirect prompt injection through retrieved documents can implant false context without direct access to the agent
- Long-running agents accumulate large memory stores that become increasingly difficult to audit for integrity
- The distinction between legitimate context updates and adversarial memory manipulation is often ambiguous to automated systems
Real-World Context
Memory poisoning is a relatively recent threat category with no specific incidents yet documented in the TopAIThreats taxonomy, though security researchers have demonstrated successful attacks against RAG-based systems and agents with persistent memory. The OWASP Top 10 for LLM Applications identifies data and prompt injection as related risk categories. As commercial AI agent platforms — including those from major technology companies — expand persistent memory features, the attack surface for memory poisoning continues to grow. Mitigation strategies under active research include memory integrity checksums, provenance tracking for retrieved context, and anomaly detection on memory modification patterns.
Related Threat Patterns
Related Terms
Last updated: 2026-02-14