Data Extraction
Techniques for recovering private training data or sensitive information from AI models through systematic querying.
Definition
Data extraction in the context of AI security refers to adversarial techniques designed to recover private or sensitive information from trained machine learning models. These attacks exploit the fact that models inevitably memorise portions of their training data, particularly data points that are unusual, repeated, or insufficiently anonymised. Extraction techniques range from carefully crafted queries that elicit memorised training examples to more sophisticated approaches that reconstruct training data by analysing model outputs, confidence scores, or gradient information. Data extraction poses serious privacy and security risks.
How It Relates to AI Threats
Data extraction is a critical technical attack within Security & Cyber threats, specifically targeting the confidentiality of data encoded within AI models. When models are trained on sensitive datasets including personal records, proprietary information, or confidential communications, extraction attacks can expose this data to unauthorised parties. This threat is amplified by the widespread deployment of AI models as publicly accessible APIs, which provide adversaries with the query access needed to systematically probe for memorised training data without requiring direct access to model weights.
Why It Occurs
- Models memorise unique or repeated training examples during optimisation
- Overfitting increases the volume of recoverable training data
- Public API access enables unlimited adversarial querying at low cost
- Differential privacy protections are computationally expensive and underused
- Model outputs leak statistical information about underlying training distributions
Real-World Context
Researchers have demonstrated successful extraction of verbatim training data from large language models, including personally identifiable information, copyrighted text, and code. In one notable study, researchers recovered names, phone numbers, and email addresses from a publicly available language model using targeted prompting strategies. These findings have prompted increased attention to training data curation, differential privacy techniques, and output filtering as defensive measures.
Related Threat Patterns
Related Terms
Last updated: 2026-02-14