Proxy Discrimination
A form of algorithmic discrimination where AI systems use ostensibly neutral variables that correlate with protected characteristics, producing biased outcomes without explicitly referencing protected attributes.
Definition
Proxy discrimination occurs when an AI system produces disparate outcomes across protected groups by relying on input variables that are statistically correlated with protected characteristics such as race, gender, age, or disability, even though the protected attributes themselves are not directly used as model inputs. Common proxy variables include postal code (correlated with race and socioeconomic status), name (correlated with ethnicity and gender), browsing history, and educational institution. Because proxy relationships emerge from structural patterns in training data that reflect historical inequities, simply removing protected attributes from model inputs — a practice known as “fairness through unawareness” — is insufficient to prevent discriminatory outcomes.
How It Relates to AI Threats
Proxy discrimination is a core mechanism within the Discrimination & Social Harm domain, operating across allocational harm, data imbalance bias, and representational harm sub-categories. AI systems trained on historically biased data learn and reproduce correlations between proxy variables and protected characteristics, often amplifying existing disparities through optimisation processes that reward predictive accuracy over equity. The opacity of complex models makes proxy discrimination particularly difficult to detect and remedy, as the discriminatory mechanism is embedded in learned feature interactions rather than explicit rules. This creates systemic risk in high-stakes domains where AI mediates access to employment, credit, housing, healthcare, and criminal justice outcomes.
Why It Occurs
- Training datasets encode historical patterns of structural discrimination that AI models learn and replicate
- Statistical correlations between neutral variables and protected characteristics persist across most real-world datasets
- Removing protected attributes from inputs fails to eliminate proxy relationships among remaining features
- Model optimisation prioritises predictive accuracy, which may reward reliance on discriminatory correlations
- Auditing for proxy effects requires specialised fairness testing that many deployers do not conduct
Real-World Context
Proxy discrimination has been documented in credit scoring systems (INC-13-0001) and hiring algorithms (INC-18-0002), where variables such as postal code, name patterns, and educational history served as proxies for race and gender. Regulatory responses include the EU AI Act’s requirements for bias testing in high-risk systems, the U.S. CFPB’s guidance on fair lending in algorithmic decision-making, and the UK Equality and Human Rights Commission’s framework for assessing AI fairness. Academic research has demonstrated that proxy discrimination can persist even after multiple rounds of bias mitigation.
Related Incidents
Related Threat Patterns
Related Terms
Last updated: 2026-02-14