Proxy Discrimination

Definition

Proxy discrimination occurs when an AI system produces disparate outcomes across protected groups by relying on input variables that are statistically correlated with protected characteristics such as race, gender, age, or disability, even though the protected attributes themselves are not directly used as model inputs. Common proxy variables include postal code (correlated with race and socioeconomic status), name (correlated with ethnicity and gender), browsing history, and educational institution. Because proxy relationships emerge from structural patterns in training data that reflect historical inequities, simply removing protected attributes from model inputs — a practice known as “fairness through unawareness” — is insufficient to prevent discriminatory outcomes.

How It Relates to AI Threats

Proxy discrimination is a core mechanism within the Discrimination & Social Harm domain, operating across allocational harm, data imbalance bias, and representational harm sub-categories. AI systems trained on historically biased data learn and reproduce correlations between proxy variables and protected characteristics, often amplifying existing disparities through optimisation processes that reward predictive accuracy over equity. The opacity of complex models makes proxy discrimination particularly difficult to detect and remedy, as the discriminatory mechanism is embedded in learned feature interactions rather than explicit rules. This creates systemic risk in high-stakes domains where AI mediates access to employment, credit, housing, healthcare, and criminal justice outcomes.

Why It Occurs

Training datasets encode historical patterns of structural discrimination that AI models learn and replicate
Statistical correlations between neutral variables and protected characteristics persist across most real-world datasets
Removing protected attributes from inputs fails to eliminate proxy relationships among remaining features
Model optimisation prioritises predictive accuracy, which may reward reliance on discriminatory correlations
Auditing for proxy effects requires specialised fairness testing that many deployers do not conduct

Real-World Context

Proxy discrimination has been documented in credit scoring systems (INC-13-0001) and hiring algorithms (INC-18-0002), where variables such as postal code, name patterns, and educational history served as proxies for race and gender. Regulatory responses include the EU AI Act’s requirements for bias testing in high-risk systems, the U.S. CFPB’s guidance on fair lending in algorithmic decision-making, and the UK Equality and Human Rights Commission’s framework for assessing AI fairness. Academic research has demonstrated that proxy discrimination can persist even after multiple rounds of bias mitigation.

Definition

How It Relates to AI Threats

Why It Occurs

Real-World Context

Related Incidents

Related Threat Patterns

Related Terms