Erasure
The systematic invisibility or underrepresentation of certain groups in AI training data, model outputs, or system design, leading to the denial of recognition, resources, or participation.
Definition
Erasure in the context of AI refers to the systematic omission, underrepresentation, or invisibility of specific demographic groups, cultures, languages, or identities within AI training datasets, model outputs, or system design choices. Unlike active discrimination, erasure operates through absence: groups are harmed not by what the system does to them but by the system’s failure to account for their existence. This can manifest as facial recognition systems that fail to detect darker skin tones, language models that lack fluency in minority languages, or recommendation systems that surface no content representing marginalized communities. Erasure compounds over time as AI-generated content feeds future training pipelines.
How It Relates to AI Threats
Erasure is a central harm mechanism within the Discrimination and Social Harm domain. Under the representational harm sub-category, erasure describes how AI systems can perpetuate and deepen the invisibility of groups that are already marginalized in historical data. When training datasets underrepresent a population, models learn to treat that population as statistically negligible, producing outputs that ignore their needs, perspectives, and experiences. This creates downstream consequences in healthcare, education, employment, and public services where AI-mediated decisions increasingly determine access and opportunity. The harms are structural rather than incidental, rooted in whose data was collected and whose was not.
Why It Occurs
- Training datasets disproportionately draw from dominant languages, geographies, and demographics, leaving minority groups underrepresented
- Data collection practices reflect existing power structures that historically excluded marginalized communities
- Evaluation benchmarks rarely measure performance across all affected populations, masking erasure in aggregate metrics
- Commercial incentives prioritize majority-market accuracy over inclusive coverage of smaller demographic groups
- Feedback mechanisms are absent for communities that lack the visibility or access to report omissions in AI outputs
Real-World Context
While no specific incidents in the TopAIThreats taxonomy currently document erasure as a standalone event, the pattern is well-established in research. Studies have demonstrated that commercial facial recognition systems exhibit significantly higher error rates for women with darker skin tones, effectively erasing their ability to interact with biometric systems. Language models trained predominantly on English-language corpora exhibit reduced capability for underresourced languages. Regulatory frameworks including the EU AI Act address erasure indirectly through requirements for representative training data and bias testing in high-risk AI systems.
Related Threat Patterns
Related Terms
Last updated: 2026-02-14