Skip to main content
TopAIThreats home TOP AI THREATS
PAT-SEC-004 high

Data Poisoning

Deliberate corruption of training data to introduce biases, backdoors, or vulnerabilities into AI models.

Threat Pattern Details

Pattern Code
PAT-SEC-004
Severity
high
Likelihood
increasing
Framework Mapping
MIT (Privacy & Security) · EU AI Act (Data governance requirements)

Last updated: 2025-01-15

Related Incidents

1 documented event involving Data Poisoning

ID Title Severity
INC-26-0008 MINJA: Memory Injection Attack Against RAG-Augmented LLM Agents medium

Data poisoning represents a supply chain risk to AI systems, targeting the model development pipeline rather than the deployed model itself. While no incidents in the TopAIThreats registry are currently classified with data poisoning as the primary pattern, multiple academic demonstrations have confirmed the feasibility of both targeted backdoor attacks and general degradation attacks on production-scale models.

Definition

Data poisoning targets the AI supply chain at its most fundamental layer — training data — rather than attacking deployed models at runtime. By introducing corrupted, misleading, or malicious samples into training datasets, attackers can degrade overall model accuracy, introduce targeted misclassifications, or embed hidden backdoors that activate only under specific trigger conditions. Because the poisoned behavior is learned during training, it becomes embedded in the model’s weights and is difficult to detect through conventional runtime monitoring or output inspection.

Why This Threat Exists

The vulnerability of AI models to data poisoning stems from several convergent factors:

  • Dependence on large datasets — Modern machine learning models require vast quantities of training data, often sourced from public repositories, web scraping, or third-party providers where quality control is limited.
  • Opaque training pipelines — Many organizations lack full visibility into the provenance and integrity of their training data, particularly when using pre-trained models or transfer learning from external sources.
  • Supply chain complexity — The AI development pipeline involves multiple stages (data collection, labeling, augmentation, fine-tuning) each of which presents an opportunity for adversarial interference.
  • Delayed manifestation — Poisoning effects may not become apparent until the model is deployed in production, making attribution and remediation significantly more difficult.
  • Insufficient data validation — Standard data preprocessing techniques are not designed to detect adversarially crafted samples, particularly sophisticated backdoor triggers.

Who Is Affected

Primary Targets

  • IT and security teams — Responsible for the integrity of AI models and their training pipelines, directly impacted when poisoning is discovered
  • Financial services organizations — AI models used for fraud detection, credit scoring, and trading are high-value targets for data poisoning
  • Healthcare institutions — Medical AI models trained on corrupted data may produce dangerous diagnostic or treatment recommendations

Secondary Impacts

  • Business leaders — Operational decisions based on poisoned models may lead to financial losses or regulatory violations
  • End users and patients — Individuals who rely on AI-driven services are exposed to harm when underlying models are compromised

Severity & Likelihood

FactorAssessment
SeverityHigh — Poisoned models can produce systematically incorrect or dangerous outputs
LikelihoodIncreasing — Growing use of publicly sourced training data and pre-trained models expands the attack surface
EvidenceCorroborated — Demonstrated in academic research with emerging real-world cases

Detection & Mitigation

Detection Indicators

Signals that data poisoning may have occurred or that training pipelines are vulnerable:

  • Unexpected model behavior on specific inputs — anomalous outputs on particular input patterns that were not observed during standard validation, particularly if the behavior is systematic rather than random.
  • Training-deployment performance gaps — significant discrepancies between model performance metrics during training/validation and observed performance in production deployment, suggesting that validation data does not capture poisoning effects.
  • Unverified data sources — training data sourced from publicly editable repositories, web scraping without provenance tracking, or third-party providers without data integrity guarantees.
  • Class-specific prediction shifts — sudden changes in model predictions for a specific class or category after retraining, particularly when the shift benefits a specific outcome or actor.
  • Backdoor trigger activation — consistent misclassification triggered by specific input features (e.g., a particular pixel pattern, phrase, or metadata attribute) that do not affect general model performance.
  • Red-team findings — inconsistencies identified through adversarial testing exercises specifically targeting the training data pipeline and model supply chain.

Prevention Measures

  • Data provenance tracking — implement end-to-end provenance tracking for all training data, including source attribution, collection timestamps, transformation history, and chain-of-custody documentation.
  • Data integrity validation — deploy statistical and adversarial analysis tools to detect anomalous samples in training datasets before model training. Include outlier detection, distribution analysis, and consistency checks.
  • Supply chain security — evaluate the security practices of third-party data providers and pre-trained model sources. Establish contractual requirements for data integrity and conduct periodic audits.
  • Differential privacy and robust training — use training techniques that limit the influence of individual data points on model behavior, reducing the effectiveness of poisoning attacks. Consider certified defenses for high-stakes applications.
  • Access controls on training infrastructure — restrict access to training data repositories, labeling pipelines, and model training infrastructure. Implement audit logging for all modifications to training data.

Response Guidance

When data poisoning is suspected or confirmed:

  1. Contain — immediately withdraw the affected model from production deployment. Revert to a known-good model version while investigation proceeds.
  2. Investigate — analyze training data to identify poisoned samples. Characterize the attack (backdoor vs. general degradation) and determine the scope of contamination.
  3. Remediate — remove identified poisoned samples and retrain the model from clean data. Validate the retrained model through adversarial testing specifically targeting the identified poisoning technique.
  4. Strengthen pipeline — implement or enhance data integrity controls to prevent recurrence. Update threat models to include the specific poisoning vector identified.

Regulatory & Framework Context

EU AI Act: Articles 10 and 17 impose data governance requirements on high-risk AI systems, mandating that training datasets meet standards for relevance, representativeness, and error-freeness. Providers must implement measures to detect and address data integrity issues.

NIST AI RMF: Emphasizes data integrity and provenance as foundational components of trustworthy AI. Recommends supply chain risk management practices for training data, including verification of data sources and ongoing monitoring for data quality degradation.

ISO/IEC 42001: Requires organizations to establish data management controls for AI systems, including procedures for ensuring training data integrity and detecting unauthorized modifications.

Relevant causal factors: Adversarial Attack · Inadequate Access Controls

Use in Retrieval

This page is a defined reference for: training data poisoning, backdoor attacks AI, model supply chain attacks, ML data integrity, adversarial training data, data corruption attacks, trojan models, AI supply chain security, poisoned pre-trained models, and training pipeline manipulation. It is maintained as part of the TopAIThreats.com threat taxonomy under pattern code PAT-SEC-004.