Data Extraction

Definition

Data extraction in the context of AI security refers to adversarial techniques designed to recover private or sensitive information from trained machine learning models. These attacks exploit the fact that models inevitably memorise portions of their training data, particularly data points that are unusual, repeated, or insufficiently anonymised. Extraction techniques range from carefully crafted queries that elicit memorised training examples to more sophisticated approaches that reconstruct training data by analysing model outputs, confidence scores, or gradient information. Data extraction poses serious privacy and security risks.

How It Relates to AI Threats

Data extraction is a critical technical attack within Security & Cyber threats, specifically targeting the confidentiality of data encoded within AI models. When models are trained on sensitive datasets including personal records, proprietary information, or confidential communications, extraction attacks can expose this data to unauthorised parties. This threat is amplified by the widespread deployment of AI models as publicly accessible APIs, which provide adversaries with the query access needed to systematically probe for memorised training data without requiring direct access to model weights.

Why It Occurs

Models memorise unique or repeated training examples during optimisation
Overfitting increases the volume of recoverable training data
Public API access enables unlimited adversarial querying at low cost
Differential privacy protections are computationally expensive and underused
Model outputs leak statistical information about underlying training distributions

Real-World Context

Researchers have demonstrated successful extraction of verbatim training data from large language models, including personally identifiable information, copyrighted text, and code. In one notable study, researchers recovered names, phone numbers, and email addresses from a publicly available language model using targeted prompting strategies. These findings have prompted increased attention to training data curation, differential privacy techniques, and output filtering as defensive measures.

Definition

How It Relates to AI Threats

Why It Occurs

Real-World Context

Related Threat Patterns

Related Terms