Methodology
Last updated: 2026-03-04
This page describes how the TopAIThreats taxonomy is constructed, validated, and maintained. It explains how threat patterns are identified, how domains are defined, how evidence and severity judgments are applied, and how the full taxonomy structure — including causal factors, harm types, assets, lifecycle stages, and governance framework mappings — supports consistent classification of AI-enabled threats.
The methodology is designed to prioritise clarity, reproducibility, and real-world relevance, rather than speculative or purely technical risk modelling.
Detailed Methodology
What Is an AI Threat?
An AI threat is any real-world harm or credible risk of harm that is materially enabled, amplified, or caused by an artificial intelligence system.
This definition requires that AI be a necessary component of the threat mechanism — not merely present in the technology stack. A cyberattack that happens to use an AI-powered tool qualifies only if the AI capability materially changes the nature, scale, or feasibility of the attack. General software failures, human errors, and policy disputes without an AI component are excluded.
What Qualifies as an Incident?
An incident is a documented event in which an AI system caused, contributed to, or demonstrated credible potential for real-world harm to individuals, organisations, or society.
An incident is considered in scope when two conditions are met:
- AI is a material component of the incident (not merely "digital" or "cyber" — AI must have caused, enabled, or amplified the event)
- One of the following is documented:
- Real-world harm — financial loss, privacy breach, physical harm, discrimination, reputational or psychological damage
- A verified system failure with credible risk of harm (near-miss)
- A capability demonstration indicating a dangerous failure mode (early warning signal)
- A structural threat pattern emerging across multiple incidents (systemic risk)
Incidents that are purely theoretical, unverified, or lack credible sourcing are excluded. Signal incidents must demonstrate a clear dangerous capability or a system failure in a real-world deployment — speculative blog posts and theoretical concerns without evidence are not sufficient. Each incident requires at least one verifiable source meeting the source hierarchy standards below.
Failure Stages
Every incident is assigned a failure stage reflecting its position in the threat progression model:
- Signal
- Early demonstration of a dangerous capability. The AI system showed it could cause harm, even if no harm occurred. Requires
primaryorcorroboratedevidence. - Near Miss
- A system failure occurred but harm was avoided or limited. The incident demonstrates a credible risk that could escalate.
- Harm
- Real-world damage occurred — financial, physical, reputational, psychological, or to rights and freedoms.
- Systemic Risk
- Multiple incidents demonstrate a structural threat pattern. The risk is not isolated but represents an emerging systemic concern.
Severity ratings (critical, high, medium, low) apply independently to all failure stages. A signal can be rated high severity if the demonstrated capability is particularly dangerous; a harm incident can be rated low if the impact was minor.
Purpose of the Methodology
The goal of this methodology is to ensure that the TopAIThreats classification framework:
- Classifies AI-enabled risks consistently across contexts
- Focuses on observable harm, verified failures, and demonstrated capabilities, not implementation details
- Remains grounded in documented real-world events across all failure stages
- Can evolve as new threat patterns emerge
- Provides a shared vocabulary usable by researchers, regulators, and AI systems alike
This methodology governs inclusion, classification, and maintenance, but does not prescribe policy or mitigation strategies.
Core Classification Principle
Threats are classified by the nature of harm they cause, not by the technology used to cause them.
Impact-First, Not Technology-First
Threats are classified based on observable impact and harm, rather than:
- Model architecture
- Training technique
- Deployment modality
- AI capability class
This ensures that the taxonomy remains applicable across current and future AI systems, even as underlying technologies change.
The 8-Domain Taxonomy
The taxonomy organises all AI-enabled threats into eight domains based on the primary category of harm, with 42 threat patterns describing specific mechanisms through which harm occurs.
The eight domains are:
- Information Integrity (DOM-INF) — Threats to the reliability and authenticity of information, including deepfakes, disinformation, and synthetic media
- Security & Cyber (DOM-SEC) — AI-enhanced cyberattacks, adversarial evasion, automated vulnerability exploitation, and AI-morphed malware
- Privacy & Surveillance (DOM-PRI) — Mass surveillance, biometric tracking, inference attacks, and erosion of informational self-determination
- Discrimination & Social Harm (DOM-SOC) — Algorithmic bias, discriminatory outcomes, and AI systems that reinforce or amplify social inequities
- Economic & Labor (DOM-ECO) — Workforce displacement, market manipulation, economic concentration, and labour exploitation enabled by AI
- Human-AI Control (DOM-CTL) — Loss of meaningful human oversight, autonomy erosion, and failures of alignment between AI behaviour and human intent
- Agentic Systems (DOM-AGT) — Risks from AI systems that act with increasing independence, including autonomous weapons, self-modifying systems, and agentic tool use
- Systemic Risk (DOM-SYS) — Large-scale risks including critical infrastructure failure, existential risk, and cascading systemic effects
The full classification structure with all 42 threat patterns is available on the Taxonomy page.
Domain Structure
The framework consists of:
- 8 high-level domains, each representing a distinct category of harm
- 4–6 threat patterns per domain (42 total), describing concrete mechanisms through which harm occurs
Each threat pattern is designed to be:
- Mutually distinguishable from other patterns within its domain
- Operationally meaningful — linked to observable behaviours and outcomes
- Observable in real-world incidents — not purely theoretical constructs
Domains are not intended to be strictly exclusive. Where appropriate, cross-domain intersections are explicitly documented, while maintaining a primary classification for consistency. The full classification structure is available on the Taxonomy page.
Taxonomy Dimensions
Beyond the domain–pattern hierarchy, the taxonomy includes additional dimensions that provide context for each incident and pattern.
Causal Factors
Causal factors describe why a threat occurred. The taxonomy identifies 15 causal factors grouped into four categories: malicious misuse, design and development failures, deployment and integration failures, and systemic-organisational failures. Each incident and pattern is tagged with one or more causal factors to support root-cause analysis.
See the full listing at Causal Factors.
Harm Types
Harm types describe what kind of damage results from a threat. The taxonomy identifies seven harm types: physical, financial, privacy, discrimination, reputational, psychological, and systemic. These complement the domain classification by capturing the nature of the consequence rather than the mechanism of the threat.
See the full listing at Harm Types.
Assets and Technologies
Assets describe what is targeted or involved in a threat. The taxonomy identifies 12 asset types grouped into five categories: data assets, model assets, system assets, infrastructure assets, and process assets. Each incident is tagged with the assets relevant to its threat mechanism.
See the full listing at Assets & Technologies.
Attack Lifecycle
The attack lifecycle describes when in the threat lifecycle each incident or pattern occurs. Six ordered stages are defined: reconnaissance, weaponisation, delivery, exploitation, persistence, and detection and response. These stages follow the conventional cyber kill chain model adapted for AI-specific threats.
See the full listing at Attack Lifecycle.
Governance Frameworks
Each domain carries a mapping to three external governance frameworks: the NIST AI Risk Management Framework, the EU AI Act, and ISO/IEC 42001. These mappings identify the closest corresponding risk categories in each framework, enabling cross-referencing between the TopAIThreats taxonomy and established regulatory and standards bodies.
See the full listing at Governance Frameworks.
Risk Assessment
For critical and high-severity incidents, the taxonomy includes a structured risk assessment capturing two dimensions: impact scope (individual, organisational, sectoral, or societal) and reversibility (reversible, partially reversible, or irreversible). This provides additional context for severity judgments beyond the four-level ordinal scale.
Cross-Domain Classification
Many AI-enabled threats span multiple domains. The taxonomy handles this through a primary–secondary classification model.
Each incident is assigned:
- One primary pattern — the dominant harm mechanism, which determines the incident's domain classification
- Zero or more secondary patterns — contributing or enabling mechanisms from any domain
The primary classification reflects the most significant harm observed or the mechanism most directly responsible for the outcome. Secondary patterns capture additional dimensions of the threat without diluting the primary classification.
For example, a deepfake-enabled financial fraud may have a primary classification under Information Integrity (the deepfake mechanism) and a secondary classification under Security & Cyber (the social engineering attack vector). The primary determines which domain page the incident appears under; the secondary enables cross-domain analysis.
Domain and pattern pages document known cross-domain interactions explicitly. The full structure is available on the Taxonomy page.
Identification Scheme
All entities in the taxonomy use permanent, structured identifiers that are never reused, reassigned, or deleted.
| Entity | Format | Example | Rule |
|---|---|---|---|
| Incident | INC-YY-NNNN | INC-24-0001 | YY = 2-digit year, NNNN = per-year sequence |
| Domain | DOM-XXX | DOM-INF | 3-letter abbreviation |
| Threat Pattern | PAT-XXX-NNN | PAT-INF-002 | Assigned alphabetically by slug within domain |
| Causal Factor | CAUSE-NNN | CAUSE-001 | Sequential |
| Harm Type | HARM-NNN | HARM-001 | Sequential |
| Asset | ASST-NNN | ASST-001 | Sequential |
| Lifecycle Stage | LIFE-NNN | LIFE-001 | Sequential by stage order |
| Framework | FRMW-NNN | FRMW-001 | Sequential |
Identifiers are permanent. Once assigned, an ID is never reused even if the associated entity is retired or superseded. Pattern codes (PAT-XXX-NNN) are assigned alphabetically by slug within each domain — new patterns receive the next available number. This ensures stable references for external citations and machine-readable integrations.
Threat Pattern Identification
Threat patterns are identified through systematic review of:
- Academic research and technical literature
- Regulatory filings and enforcement actions
- Incident databases and breach disclosures
- Investigative journalism and primary reporting
- Civil society and watchdog organisation reports
Patterns are included only when they represent a repeatable or generalisable mechanism of harm, rather than a single isolated anomaly. Each pattern must be distinguishable from other patterns in the same domain and linked to at least one documented incident or a well-evidenced risk mechanism.
Evidence Assessment
Evidence is assessed on a three-tier scale — primary, corroborated, or single-source — based on the independence and authority of confirming sources.
Each threat pattern is evaluated against an evidence threshold prior to inclusion. Evidence may include:
- Verified real-world incidents with identified victims and outcomes
- Legal or regulatory findings (court rulings, enforcement actions, audit reports)
- Multiple independent reports describing the same mechanism
- Longitudinal patterns observed across deployments or sectors
Where evidence is emerging or incomplete, this is explicitly noted. Speculative or purely hypothetical risks are excluded from the taxonomy.
Severity Rating Scale
Severity measures the magnitude, reversibility, and breadth of harm caused by an AI-enabled threat, rated on a four-level ordinal scale from critical to low.
Threat patterns are assessed along two independent dimensions:
Severity
Severity reflects the potential magnitude of harm if the threat occurs, considering:
- Scale of impact — number of individuals, organisations, or systems affected
- Reversibility of harm — whether damage can be undone or mitigated after the fact
- Degree of damage — human, economic, or societal consequences
Severity is expressed as an ordinal category (critical → low), not a numerical score.
Likelihood Trend
Likelihood reflects the observed or inferred trend, not a precise probability. Indicators include:
- Increasing frequency of incidents
- Lower barriers to execution
- Broader accessibility of enabling tools
Severity and likelihood assessments are periodically reviewed as new evidence becomes available.
Inclusion and Exclusion Criteria
A threat pattern is included only if it meets all of the following:
- Demonstrates observable harm or credible harm mechanisms
- Is generalisable beyond a single system or deployment
- Can be meaningfully distinguished from other patterns
- Is supported by documented evidence
The following are excluded:
- Purely speculative risks without documented precedent or credible mechanism
- Capability descriptions without harm linkage
- Issues better classified as governance or policy failures alone
- General software bugs or system outages where AI is not a material factor
Reference Frameworks and Alignment
The taxonomy is informed by, but not derived from, existing AI risk classification efforts and regulatory frameworks. Each domain carries a framework_mapping that links it to the closest corresponding categories in three external frameworks.
MIT AI Risk Repository
The MIT AI Risk Repository (opens in new tab) provides a comprehensive catalogue of AI risk factors drawn from academic and policy literature. TopAIThreats uses it as a comparative reference for threat pattern coverage and to identify categorisation gaps. Each domain maps to one or more MIT risk categories.
EU AI Act
The EU AI Act establishes a risk-based regulatory framework for AI systems deployed in the European Union. TopAIThreats aligns domain definitions with the EU AI Act's risk categories (unacceptable, high-risk, limited, minimal) and high-risk use case groupings to support regulatory cross-referencing.
NIST AI Risk Management Framework
The NIST AI RMF provides a voluntary framework for managing risks associated with AI systems. Each domain maps to relevant NIST AI RMF functions (Govern, Map, Measure, Manage) and associated risk categories, enabling organisations that follow NIST guidance to map their risk assessments against the TopAIThreats taxonomy.
ISO/IEC 42001
ISO/IEC 42001 is the international standard for AI management systems. TopAIThreats maps each domain to relevant ISO/IEC 42001 clauses and control objectives, supporting organisations pursuing certification or implementing AI governance aligned with international standards.
These frameworks inform validation and alignment decisions but do not determine the structure or boundaries of domains. The full framework mapping for each domain is displayed on its detail page. All three frameworks have dedicated pages at Governance Frameworks.
Incident Verification Process
Before any incident is published, it undergoes a six-step verification process:
- Source check — At least one source meets tier requirements and is accessible and verifiable
- Scope check — AI is materially involved and the incident is not purely speculative
- Harm check — Real-world harm is demonstrated or credible risk with evidence exists
- Rating assignment — Status, Severity, and Evidence Level are assigned per the definitions below
- Classification — Primary domain and threat pattern assigned; secondary patterns, causal factors, assets, lifecycle stages, and contextual tags (sectors, regions, affected groups) applied
- Content review — Language is checked for neutrality, sources are cited with superscripts, and no editorialising is present
Source Hierarchy
Sources are evaluated according to a five-tier hierarchy plus a discovery-only category. Tier 6 sources (unverified) are never used as evidence.
| Tier | Source Type | Examples | Trust Level |
|---|---|---|---|
| 1 | Primary | Courts, legal filings, government regulators (FTC, DOJ, Europol), law enforcement press releases, official victim organisation statements | Highest |
| 2 | Authoritative institutional | OECD, NIST, WEF, Big 4 audit firms, academic research institutions, major think tanks | High |
| 3 | Major news organisations | Reuters, BBC, Bloomberg, Wall Street Journal, New York Times | Medium-High |
| 4 | Industry publications | Trade press, sector-specific journals | Medium |
| 5 | Expert commentary | Named expert analysis, conference presentations | Context only |
| — | Discovery only (never primary evidence) | AI Incident Database, blogs, vendor reports without independent confirmation, social media | Not used |
To classify an incident as Confirmed, at least one Tier 1 source or two or more independent Tier 2–3 sources are required. If only one credible source exists, the incident status is set to Alleged pending corroboration.
Incident Rating Definitions
Status
- Confirmed — Primary source verification or multiple independent credible sources
- Alleged — Single credible report; monitoring for corroboration
- Under Investigation — Active investigation by authorities or the organisation involved; outcome pending
Severity
- Critical — Large-scale harm (>$1M aggregate), critical infrastructure involvement, or active ongoing campaigns affecting many victims
- High — Significant harm, multiple victims, or targeting vulnerable populations
- Medium — Confirmed harm but limited in scope or duration
- Low — Proof-of-concept, minor impact, or harm limited to a single instance
Evidence Level
- Primary — Direct official confirmation (courts, regulators, victim statements)
- Corroborated — Multiple independent credible sources
- Single-source — One credible report, awaiting corroboration
Resolution Status
- Open — Incident is ongoing, under investigation, or not yet resolved
- Resolved — Incident has been addressed, remediated, or concluded
Update Policy
Incidents are updated when new sources confirm or expand the event, status or severity changes, new outcomes emerge, or timeline events occur. All changes are recorded in the Update Log on each incident page. No change is made silently.
The following actions are not taken: incidents are not deleted, past content is not silently edited, incident IDs are not changed, and earlier interpretations are not rewritten. If an incident is reclassified, the original classification is preserved in the update history.
For the taxonomy itself, new threat patterns may be added as evidence emerges, existing classifications may be refined or merged, and severity and likelihood assessments may be updated. Significant changes are documented in the Changelog.
Editorial Process
Every incident undergoes a multi-step verification process and requires explicit owner approval before publication — no content is auto-published.
Incidents are identified through three channels:
- Automated discovery — Daily RSS feed scans across 11 sources, 2 HTML scrapers, and Reddit monitoring, filtered against 156 keywords covering all 42 threat patterns
- Manual submission — Incidents identified through direct research, reader tips, or editorial review
- Watchlist monitoring — Open incidents are monitored three times per week for new developments
High and medium-priority candidates trigger a Telegram notification for editorial review. No incident is published without explicit approval. This governance rule is non-negotiable.
What TopAIThreats Is NOT
This site is a classification and reference system, not a news outlet, advocacy platform, or speculative risk forecaster.
TopAIThreats does not:
- Estimate precise probabilities or predict future AI capabilities
- Rank organisations, products, or AI systems
- Advocate for specific policies or regulatory positions
- Publish speculative, hypothetical, or unverified content
- Provide legal, compliance, or professional security advice
- Replace domain-specific safety assessments or audits
Its purpose is to provide a shared analytical structure for understanding AI-enabled threats across disciplines and contexts.
Transparency
Methodological assumptions, evidence limitations, and classification uncertainties are intentionally surfaced wherever relevant. This is done to support critical evaluation, reuse, and adaptation by researchers, policymakers, and practitioners.
The taxonomy and all incident data are available in machine-readable formats through the API & Data Access hub, including JSON endpoints for the taxonomy structure, incident data, and a knowledge graph. An LLM-optimised text endpoint and RSS feed are also provided.
How to Cite
The TopAIThreats methodology and taxonomy are designed to be cited as a reference framework. Use the following formats for academic and professional citation.
Suggested Citation (APA)
TopAIThreats.com. (2026). Methodology — TopAIThreats Classification Framework (Taxonomy v3.0). Retrieved 2026-03-04, from https://topaithreats.com/methodology/
BibTeX
@misc{topaithreats2026methodology,
title = {Methodology -- TopAIThreats Classification Framework},
author = {{TopAIThreats.com}},
year = {2026},
url = {https://topaithreats.com/methodology/},
note = {Taxonomy version 3.0, last updated 2026-03-04}
} Citing Individual Incidents
Each incident page includes a "How to Cite" section with a pre-formatted citation including the incident ID, title, and direct URL. For programmatic access to citation data, see the Knowledge Graph & Citations API.
Version Note
When citing, include the access date, as the taxonomy may be updated. The current taxonomy structure is documented at /taxonomy/.
Example
To see this methodology applied in practice, see INC-24-0001: Hong Kong Deepfake CFO Video Conference Fraud — a confirmed, critical-severity incident with primary-source evidence, classified under Information Integrity (DOM-INF) with the threat pattern Deepfake Identity Hijacking (PAT-INF-002).
The incident demonstrates the full classification model: a primary pattern, secondary patterns from other domains, causal factor tagging (CAUSE-001: Intentional Misuse), harm type (HARM-002: Financial), asset identification (ASST-002: Generative Model), and lifecycle stage mapping.
Use in Retrieval
This page documents the complete classification methodology for the TopAIThreats taxonomy (version 3.0), an evidence-based framework for cataloguing AI-enabled threats across eight domains and 42 threat patterns. The methodology covers incident verification standards, source hierarchy, severity rating scales, cross-domain classification rules, identification schemes, and alignment with three external governance frameworks (NIST AI RMF, EU AI Act, ISO/IEC 42001). It is maintained at topaithreats.com/methodology and is designed to be cited as a reference framework by researchers, regulators, and AI systems.
Detailed Methodology
- Research Methodology — How threat patterns are identified, inclusion criteria, taxonomy governance, and framework mappings
- Data Collection Methodology — Discovery channels, verification process, update triggers, and known data gaps
→ About TopAIThreats · → Taxonomy · → API & Data Access · → Contributing