Memory Poisoning

Definition

Memory poisoning is an attack targeting AI agents that maintain persistent state across interactions — including conversation histories, retrieved context, stored preferences, and accumulated knowledge. By injecting false, misleading, or adversarial content into these memory stores, an attacker can influence the agent’s reasoning and actions over extended periods. Unlike data poisoning, which targets the model’s training phase, memory poisoning operates at inference time, corrupting the dynamic context that shapes an agent’s ongoing behavior. The attack is especially effective against retrieval-augmented generation (RAG) systems, where poisoned documents in a knowledge base can persistently alter agent outputs without modifying the underlying model weights.

How It Relates to AI Threats

Memory poisoning is a threat pattern within the Agentic and Autonomous AI Threats domain. As AI agents are deployed with persistent memory capabilities — storing user preferences, task histories, and accumulated context — the integrity of these memory stores becomes a critical security concern. A poisoned memory can cause an agent to provide consistently wrong information, take unauthorized actions, or subtly shift its behavior over time in ways that are difficult to detect. The threat is amplified in multi-agent systems where poisoned context can propagate from one agent to another, and in long-running agent deployments where corrupted memories compound over many interaction cycles.

Why It Occurs

AI agents increasingly rely on persistent memory and retrieval-augmented generation, creating new attack surfaces beyond the model itself
Memory stores typically lack the integrity verification and access controls applied to traditional databases
Indirect prompt injection through retrieved documents can implant false context without direct access to the agent
Long-running agents accumulate large memory stores that become increasingly difficult to audit for integrity
The distinction between legitimate context updates and adversarial memory manipulation is often ambiguous to automated systems

Real-World Context

Memory poisoning is a relatively recent threat category with no specific incidents yet documented in the TopAIThreats taxonomy, though security researchers have demonstrated successful attacks against RAG-based systems and agents with persistent memory. The OWASP Top 10 for LLM Applications identifies data and prompt injection as related risk categories. As commercial AI agent platforms — including those from major technology companies — expand persistent memory features, the attack surface for memory poisoning continues to grow. Mitigation strategies under active research include memory integrity checksums, provenance tracking for retrieved context, and anomaly detection on memory modification patterns.

Definition

How It Relates to AI Threats

Why It Occurs

Real-World Context

Related Threat Patterns

Related Terms