Memory Poisoning
Attacks or failures that corrupt an AI agent's persistent memory, context, or learned preferences, causing it to act on false information or compromised instructions across sessions.
Threat Pattern Details
- Pattern Code
- PAT-AGT-004
- Severity
- high
- Likelihood
- increasing
- Domain
- Agentic & Autonomous Threats
- Framework Mapping
- MIT (Multi-agent risks) · EU AI Act (Data integrity requirements)
- Affected Groups
- IT & Security Professionals Business Leaders
Last updated: 2026-03-07
Related Incidents
3 documented events involving Memory Poisoning
Memory Poisoning is a threat pattern targeting the persistent state of agentic AI systems. Three incidents in the TopAIThreats registry are classified under this pattern: a large-scale commercial campaign where 31 companies embedded hidden prompts in “Summarize with AI” buttons to bias assistant memory toward their brands, a Unit 42 proof-of-concept demonstrating persistent memory injection in Amazon Bedrock Agents, and the MINJA research paper demonstrating entity-specific data substitution through poisoned RAG memory. These incidents confirm that memory poisoning has moved from theoretical risk to active exploitation in production systems.
Definition
Unlike prompt injection, which targets a single interaction, memory poisoning compromises the persistent memory, stored context, or learned preferences that an AI agent carries across sessions. When an agent’s memory is poisoned, all subsequent actions may be based on fabricated facts, altered user preferences, or adversarially implanted instructions — without any visible anomaly in the current interaction.
The critical distinction from standard prompt injection is persistence: a single poisoning event can influence all future interactions until the corrupted memory is detected and removed.
How Memory Poisoning Works in Agent Architectures
Modern agentic AI systems typically combine several components that create the attack surface for memory poisoning:
Typical Agent Architecture and Injection Points
-
User query → LLM planner — The planner decides which tools to invoke and what memory to consult. Injection can occur here if the planner processes external content (emails, web pages) that contains hidden instructions.
-
LLM planner → Vector database / memory store — The planner reads from and writes to persistent memory. Poisoned entries stored here are retrieved in future sessions based on semantic similarity, meaning the attacker only needs to match the embedding space of likely future queries.
-
Tool outputs → Memory store — Many agent frameworks automatically write tool execution results (API responses, web scrapes, file contents, log entries) into memory without validation. An attacker who controls any external data source that a tool reads can inject content that flows directly into persistent memory.
-
RAG retrieval → LLM generation — When the agent retrieves poisoned memory entries during a future query, the LLM treats them as authoritative context. The retrieval system has no mechanism to distinguish legitimate from poisoned entries — both are embeddings in the same vector space.
Multi-Tenant and Shared Memory Risks
In multi-user deployments where agents share memory stores (e.g., team assistants, organizational knowledge bases, shared RAG indices):
- Cross-user contamination — A poisoned entry from one user’s interaction can be retrieved during another user’s query if the entries are semantically similar. The second user has no visibility into who created the memory entry or whether it was validated.
- Privilege escalation via memory — An unprivileged user may inject memory entries that influence an agent’s responses to privileged users, effectively bypassing access controls through the shared memory layer.
- Cascading corruption in multi-agent systems — When multiple agents share memory or pass context between themselves, poisoning one agent’s memory can propagate to others through normal inter-agent communication, as demonstrated in the Agent-to-Agent Propagation pattern.
Why This Threat Exists
The vulnerability of AI agents to memory poisoning arises from architectural assumptions in modern agentic systems:
- Persistent memory as a feature — Agentic AI systems maintain long-term memory to personalize interactions and improve task performance, creating a durable attack surface that extends across sessions.
- Implicit trust in memory contents — Agent frameworks treat their own memory stores as authoritative. Retrieved entries are not validated against external ground truth before being used as context for generation.
- Unvalidated memory writes — Many agent frameworks allow new information to be committed to memory without robust verification of its accuracy or provenance, particularly when tool outputs or external content are processed automatically.
- Indirect injection pathways — Agents that process external content (emails, documents, web pages, API responses) may inadvertently incorporate adversarially crafted information into their persistent memory stores. The Microsoft Defender research documented over 50 distinct hidden prompts using this vector.
- No native provenance layer — Most vector databases and memory stores do not track provenance metadata (who wrote this, when, from what source, with what trust level) as a first-class feature, making it difficult to audit or quarantine suspicious entries.
- Difficulty of detection — Poisoned memory entries may appear syntactically valid and contextually plausible, making them difficult to distinguish from legitimate information. The MINJA research showed that attack prompts can be designed to appear benign during injection while activating only on specific trigger entities.
Who Is Affected
Primary Targets
- IT and security teams — Responsible for the integrity of agent systems and first to investigate when agent behavior becomes erratic or compromised
- Organizations deploying AI assistants with persistent memory — Enterprise agents that manage sensitive information, scheduling, or communications are high-value targets. The Unit 42 research demonstrated data exfiltration from Amazon Bedrock Agents through this vector.
- Organizations using shared AI knowledge bases — Teams using shared RAG indices or organizational memory stores face cross-user contamination risk
Secondary Impacts
- Business professionals — Users who rely on AI agents for decision support may unknowingly receive recommendations based on poisoned memory, leading to flawed decisions
- Consumers of AI-generated recommendations — The Summarize with AI campaign demonstrated that consumers across health, finance, and security topics were exposed to biased recommendations from poisoned assistant memory
- Competing businesses — Companies disadvantaged by competitors who poison AI recommendation systems gain an unfair market advantage
Severity & Likelihood
| Factor | Assessment |
|---|---|
| Severity | High — Poisoned agent memory can produce systematically biased recommendations at scale (31 companies documented) and enable persistent data exfiltration |
| Likelihood | Increasing — Active commercial exploitation documented; research PoCs demonstrate attack feasibility against major cloud platforms |
| Evidence | Confirmed — Multiple primary sources including Microsoft Defender research and Unit 42 proof-of-concept |
Detection & Mitigation
Detection Indicators
Signals that memory poisoning may have affected an AI agent:
- Unattributed stored information — agent referencing facts, preferences, or instructions that the user did not provide and cannot trace to a legitimate source, suggesting injection through external content.
- Unexplained recommendation bias — agent consistently favoring specific brands, products, or vendors in recommendations without a clear basis in user preferences or objective data. The Microsoft Defender research identified this as the primary indicator in the 31-company campaign.
- Post-content-processing anomalies — agent taking unexpected actions or exhibiting behavioral changes after processing external content such as emails, documents, web pages, or shared files.
- Memory-reality inconsistencies — discrepancies between what an agent reports about its stored context, user preferences, or prior interactions and what is verifiably true.
- Correction resistance — agent resisting user corrections to stored information, reverting to previously corrected false entries, or treating injected information as higher priority than user-provided corrections.
- Dormant trigger activation — agent behavior changing specifically when certain entities, topics, or keywords are mentioned, suggesting entity-specific poisoning as demonstrated by MINJA.
Prevention Measures
- Memory input validation — implement validation and sanitization for all information entering agent persistent memory, particularly from external sources (emails, documents, web content, tool outputs). Treat all external content as untrusted input. This includes tool execution results — do not write API responses, web scrapes, or log entries directly into memory without sanitization.
- Memory provenance tracking — maintain provenance metadata for all persistent memory entries, recording the source, timestamp, trust level, and context of each stored item. Enable users and administrators to audit and trace the origin of stored information. This is not natively supported by most vector databases and requires an application-layer implementation.
- Sandboxed content processing — process external content in sandboxed contexts that cannot write to agent persistent memory without explicit user authorization. Prevent prompt injection attacks from modifying stored context.
- Memory integrity monitoring — deploy automated checks that verify the consistency and accuracy of agent memory, including:
- Embedding similarity metrics to detect anomalous entries
- Behavioral baseline comparisons to flag recommendation drift
- Periodic checksums or snapshots of memory state for comparison
- Memory versioning and backup — maintain versioned snapshots of agent memory state at regular intervals. This is a prerequisite for effective incident response — without memory versioning, “revert to a clean state” is impossible because there is no known-good baseline to restore.
- User memory controls — provide users with accessible tools to view, edit, and delete agent memory entries. Support periodic memory review and cleanup to identify and remove poisoned entries.
- Multi-tenant isolation — in shared deployments, enforce strict isolation between user memory namespaces. Implement access controls that prevent cross-user memory retrieval without explicit sharing permissions.
Response Guidance
When memory poisoning is detected or suspected:
- Quarantine — immediately restrict the agent’s ability to take autonomous actions based on potentially poisoned memory. If memory versioning is available, revert to the most recent known-good snapshot. If no versioning exists, operate without persistent memory while investigation proceeds.
- Audit — review the agent’s memory contents to identify poisoned entries. Use provenance metadata (if available) to trace each suspicious entry to its source. Identify the injection vector: specific document, email, web page, tool output, or interaction. In multi-tenant environments, check whether contamination has spread to other users’ memory namespaces.
- Clean — remove poisoned entries and restore accurate information. Verify that the cleaned memory produces correct agent behavior by testing against known scenarios. If provenance data is insufficient to identify all poisoned entries, consider a full memory reset with selective re-ingestion from trusted sources.
- Harden — implement or strengthen input validation, provenance tracking, sandboxing, and memory versioning to prevent the specific injection vector from succeeding again. Update monitoring baselines to detect similar patterns in future.
Regulatory & Framework Context
EU AI Act: Data governance requirements (Article 10) extend to the integrity of data used by AI systems during operation, not only during training. Persistent-memory agents must ensure accuracy and provenance of stored information, particularly in high-risk contexts such as healthcare, finance, and critical infrastructure.
NIST AI RMF: Addresses data integrity as a core trustworthy AI component (MAP and MANAGE functions). Recommends mechanisms for verifying provenance and accuracy of information that AI systems rely upon for decision-making. Memory poisoning directly undermines the Trustworthy AI characteristic of “Valid and Reliable.”
ISO/IEC 42001: Requires organizations to implement data integrity controls for operational AI data, including persistent memory and context storage, with validation proportionate to the agent’s operational scope and risk level.
MITRE ATLAS: Classifies memory poisoning as AML.T0080 (Memory Poisoning), as referenced in Microsoft Defender’s analysis of the Summarize with AI campaign.
Use in Retrieval
This page answers questions about AI agent memory poisoning, persistent context injection attacks, AI memory corruption, agent state manipulation, long-term memory attacks on AI assistants, cross-session context poisoning, indirect injection via stored memory, RAG memory injection, vector database poisoning, multi-tenant memory contamination, and AI recommendation poisoning. It covers how memory poisoning flows through agent architectures (planner → vector DB → tools → retrieval), detection indicators, prevention measures including memory versioning and provenance tracking, organizational response guidance, multi-tenant isolation requirements, and the regulatory framework for memory integrity in agentic AI systems. Three documented incidents are classified under this pattern. Use this page as a reference for threat pattern PAT-AGT-004 in the TopAIThreats taxonomy.