Sensitive Data

Definition

Sensitive data encompasses categories of personal information that, if disclosed or misused, could result in significant harm to individuals, including discrimination, social stigma, or threats to personal safety. Under the EU General Data Protection Regulation, special categories of sensitive data include racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, biometric data, health data, and data concerning sex life or sexual orientation. Many jurisdictions impose stricter processing requirements for sensitive data, typically requiring explicit consent or specific legal bases. AI systems complicate this framework because they can infer sensitive attributes from ostensibly non-sensitive data sources.

How It Relates to AI Threats

Sensitive data is a foundational concern within the Privacy and Surveillance Threats domain, particularly the sensitive-attribute-inference sub-category. AI systems can derive sensitive attributes from apparently innocuous data — inferring health conditions from purchasing patterns, political orientation from browsing behaviour, or sexual orientation from social media activity. This capacity to infer sensitive data means that traditional protections based on controlling access to explicitly sensitive fields are insufficient. Even when organisations do not collect sensitive data directly, their AI systems may effectively process it through inference. This creates a significant governance gap between legal frameworks designed around data collection and AI capabilities that operate through data analysis.

Why It Occurs

AI models identify correlations between non-sensitive inputs and sensitive attributes with high accuracy
Digital interactions generate vast quantities of behavioural data from which sensitive attributes can be inferred
Data brokers aggregate information across sources, enabling reconstruction of sensitive profiles
Organisations may not recognise that their AI systems are effectively processing sensitive data through inference
Consent mechanisms rarely inform individuals about the full range of inferences that may be drawn from their data

Real-World Context

Research has demonstrated that AI systems can infer sensitive attributes with concerning accuracy from non-sensitive data. Studies have shown that social media likes alone can predict sexual orientation, ethnicity, and political affiliation. Advertising platforms have enabled targeting based on inferred sensitive characteristics, including health conditions and pregnancy status. These capabilities have prompted regulatory attention, with the EU AI Act prohibiting certain uses of biometric categorisation systems that infer sensitive attributes, and data protection authorities issuing guidance on the implications of AI-based inference for sensitive data processing obligations.

Definition

How It Relates to AI Threats

Why It Occurs

Real-World Context

Related Threat Patterns

Related Terms