Step-by-step workflow for auditing AI systems for discriminatory outcomes, including fairness metric selection, disaggregated evaluation, data auditing, and regulatory compliance guidance.
Last updated: 2026-03-21
Who this is for: ML engineers, product managers, compliance officers, and civil rights analysts responsible for evaluating AI systems for bias before deployment or during operation — particularly systems used in hiring, lending, housing, healthcare, education, or criminal justice.
What AI Bias Is and Why Auditing Matters
AI bias occurs when an AI system produces systematically different outcomes for different groups of people in ways that are unjust or discriminatory. Bias can emerge from training data that underrepresents or misrepresents specific populations, from features that serve as proxies for protected attributes, from modeling choices that optimize for the majority population, or from deployment contexts that differ from training conditions.
The consequences are well-documented:
Amazon AI hiring tool — systematically downgraded résumés containing words associated with women (e.g., “women’s chess club captain”)
COMPAS recidivism algorithm — false positive rate for Black defendants approximately twice that for white defendants
Pulse oximeter racial bias — AI systems perpetuated medical device biases, underperforming on darker skin tones
Standard performance metrics (accuracy, F1, AUC) mask group-level disparities because they aggregate across the full population. Bias auditing disaggregates performance to reveal disparities that aggregate metrics conceal.
Proxy Discrimination — neutral features that correlate with protected attributes
Step 1: Define the Audit Scope
Step 2: Select Appropriate Fairness Metrics
No single fairness metric is universally correct. Different metrics are appropriate for different contexts. These metrics are mathematically incompatible — you must choose which to prioritize.
For allocation decisions (hiring, lending, housing)
For risk scoring (recidivism, fraud, insurance)
For content and recommendations
Step 3: Collect and Prepare Data
Step 4: Run Quantitative Analysis
Compute fairness metrics
Use auditing tools
Tool
Approach
Best for
IBM AI Fairness 360
70+ metrics, bias mitigation algorithms
Comprehensive technical audit
Microsoft Fairlearn
Fairness assessment + constrained optimization
Python-based ML pipelines
Google What-If Tool
Interactive visualization of model behavior
Exploratory analysis
Aequitas
Group fairness audit with report generation
Policy-focused audits
Step 5: Audit the Data and Features
Quantitative disparities have root causes in data and features. Investigate.
Data audit
Feature audit
Step 6: Document and Decide
Where This Guide Fits in AI Threat Response
Auditing (this guide) — Is this system biased? Evaluate AI systems for discriminatory outcomes.
Auditing methods — How does bias auditing work? Technical reference on fairness metrics, impossibility results, and tool comparisons.
Risk monitoring — Is bias emerging over time? Continuous monitoring for drift and emerging disparities.
Model governance — Who approved this deployment? Organizational gates requiring fairness evaluation.
Deployment checklist — Is this system ready? Pre-deployment checklist including bias assessment.