Differential Privacy

Definition

Differential privacy is a mathematical framework for quantifying and limiting the privacy risk associated with data analysis and machine learning. It provides a formal guarantee that the output of a computation does not reveal whether any specific individual’s data was included in the input dataset. This is achieved by adding carefully calibrated random noise to data, query results, or model training processes. The privacy guarantee is parameterised by a value (epsilon) that controls the trade-off between privacy protection and data utility — lower epsilon values provide stronger privacy but reduce the accuracy of results. Differential privacy can be applied at the data collection stage (local differential privacy) or at the analysis stage (central differential privacy), and has been adopted in both statistical databases and machine learning model training.

How It Relates to AI Threats

Differential privacy directly addresses threats within the Privacy & Surveillance domain. AI systems trained on personal data are vulnerable to re-identification attacks, where adversaries extract information about individuals in the training set, and sensitive attribute inference, where models reveal protected characteristics not explicitly provided. Differential privacy provides a mathematically rigorous defence against these attacks by ensuring that the inclusion or exclusion of any single individual’s data has a bounded effect on model outputs. Without differential privacy or equivalent protections, machine learning models can memorise and leak specific training examples, enabling membership inference and data extraction attacks.

Why It Occurs

Traditional anonymization techniques have been shown to be insufficient against AI-powered re-identification, motivating the adoption of provable privacy frameworks
Machine learning models can memorise individual training examples and leak them through targeted queries, creating a need for training-time privacy guarantees
Regulatory requirements including the GDPR’s data minimisation principle and purpose limitation create demand for techniques that limit information disclosure
The mathematical formalism of differential privacy allows organisations to quantify and communicate privacy risk in precise terms
Growing deployment of AI in sensitive domains — healthcare, finance, government — increases the consequences of privacy failure

Real-World Context

Differential privacy has been adopted at scale by major technology companies and government agencies. Apple uses local differential privacy to collect usage statistics from iOS devices without identifying individual users. Google has deployed differential privacy in Chrome’s data collection and in its RAPPOR system for crowdsourced statistics. The U.S. Census Bureau applied differential privacy to the 2020 Census to protect respondent confidentiality while maintaining statistical utility. In AI research, differentially private stochastic gradient descent (DP-SGD) enables the training of machine learning models with formal privacy guarantees, though at a cost to model accuracy. The framework represents one of the most developed technical approaches to reconciling the data demands of AI systems with individual privacy rights.

Definition

How It Relates to AI Threats

Why It Occurs

Real-World Context

Related Threat Patterns

Related Terms