Explainability

Definition

Explainability, also referred to as interpretability in some contexts, is the property of an AI system that allows humans to comprehend how and why it produces specific outputs or decisions. Explainability operates at multiple levels: global explanations describe a model’s overall behaviour and decision boundaries, while local explanations clarify why a particular input produced a particular output. Technical approaches include inherently interpretable models such as decision trees, as well as post-hoc explanation methods such as SHAP, LIME, and attention visualisation applied to complex models. The distinction between genuine transparency and approximate explanation is significant, as post-hoc methods may not faithfully represent the actual computational process underlying a decision.

How It Relates to AI Threats

Explainability is a cross-cutting governance requirement relevant to threats in the Economic & Labor Disruption, Human-AI Control, and Discrimination & Social Harm domains. Without explainability, organisations cannot identify whether AI systems produce biased or discriminatory outcomes, affected individuals cannot contest adverse decisions, and regulators cannot verify compliance with anti-discrimination law. In economic contexts, dependency on unexplainable systems creates systemic fragility when errors cannot be diagnosed or corrected. The absence of explainability also facilitates automation bias, as human operators lack the information needed to critically evaluate AI recommendations and default to uncritical acceptance.

Why It Occurs

Deep learning architectures achieve high accuracy through complexity that inherently resists human interpretation
Post-hoc explanation methods provide approximations that may diverge from actual model reasoning
Organisations face trade-offs between model performance and interpretability during system design
Standardised metrics and benchmarks for evaluating explanation quality remain underdeveloped
Proprietary constraints prevent external researchers and regulators from accessing model internals

Real-World Context

Incidents such as INC-13-0001 and INC-18-0002 illustrate consequences of deploying high-stakes AI systems without adequate explainability, where biased outcomes persisted undetected because decision logic could not be audited. The EU AI Act mandates explainability for high-risk AI systems, and the GDPR’s right to explanation provisions have been tested in court. The U.S. Equal Credit Opportunity Act requires lenders to provide reasons for adverse credit decisions, creating de facto explainability requirements for AI-based credit scoring. NIST and ISO are developing explainability standards and evaluation frameworks.

Definition

How It Relates to AI Threats

Why It Occurs

Real-World Context

Related Incidents

Related Threat Patterns

Related Terms