Data Poisoning
The deliberate corruption or manipulation of training data used to build machine learning models, causing them to learn incorrect patterns, produce biased outputs, or contain hidden backdoors exploitable by an attacker.
Definition
Data poisoning is an attack vector in which an adversary introduces corrupted, mislabeled, or strategically crafted samples into a machine learning model’s training dataset. Because modern ML models derive their behavior entirely from training data, even a small proportion of poisoned samples can substantially alter model outputs. Poisoning attacks may be indiscriminate, degrading overall model accuracy, or targeted, embedding hidden backdoors that activate only when specific trigger patterns are present in inputs. The attack is particularly insidious because poisoned models can pass standard evaluation benchmarks while harboring vulnerabilities that surface only under attacker-controlled conditions.
How It Relates to AI Threats
Data poisoning is a primary threat pattern within the Security and Cyber Threats domain. It directly compromises the integrity of AI systems at their foundation — the training data pipeline. When models trained on poisoned data are deployed in critical applications such as fraud detection, medical diagnosis, or content moderation, the consequences can be severe and difficult to trace. Data poisoning also intersects with algorithmic bias: if training data is manipulated to over-represent or under-represent certain groups, the resulting model may systematically discriminate. Organizations that rely on web-scraped or crowdsourced data are especially vulnerable, as these collection methods offer multiple injection points for adversaries.
Why It Occurs
- Large-scale training datasets are often assembled from public or semi-public sources where data provenance is difficult to verify
- Models trained on poisoned data can pass standard accuracy benchmarks, making detection through conventional evaluation insufficient
- Supply chain complexity in ML pipelines creates multiple points where training data can be intercepted or modified
- Backdoor triggers can be designed to be statistically rare, activating only under conditions the attacker controls
- The increasing use of synthetic data and automated data collection expands the attack surface for poisoning attempts
Real-World Context
No incidents in the current TopAIThreats taxonomy are exclusively attributed to data poisoning, though the technique is recognized as a foundational risk across the security-cyber domain. NIST has published specific guidance on data integrity for AI systems, and the EU AI Act requires high-risk AI providers to implement data governance measures that address poisoning risks. Academic research has demonstrated successful poisoning attacks against image classifiers, language models, and recommendation systems, prompting major AI companies to invest in data provenance tracking and anomaly detection within training pipelines.
Related Threat Patterns
Related Terms
Last updated: 2026-02-14