Research Methodology — TopAIThreats

Overview

This page documents the research methodology used to identify, define, and classify AI threat patterns in the TopAIThreats taxonomy. Pattern identification draws on systematic review of published sources across academic, regulatory, and journalistic domains, combined with expert judgment about generalisability and harm significance. The process is designed to ensure that threat patterns reflect repeatable mechanisms of real-world harm rather than isolated anomalies or speculative risk.

The core classification principle — threats are defined by the harm they cause, not by the technology used to cause them — governs all research and classification decisions. The full classification framework is described on the Methodology page.

How Threat Patterns Are Identified

Threat patterns are identified through ongoing review of published sources across multiple disciplines. No single source type is treated as definitive; patterns are recognised when the same harm mechanism appears across independent sources in a way that suggests generalisability beyond a single event or deployment context.

Source Types Reviewed

Pattern identification draws on the following source categories:

Source Category	Examples	Primary Use
Academic and technical literature	Peer-reviewed papers, conference proceedings (NeurIPS, IEEE S&P, USENIX), preprints	Capability demonstration, mechanism documentation
Regulatory and legal sources	Enforcement actions, court filings, government agency reports, audit findings	Confirmed real-world harm, legal precedent
Incident databases	AI Incident Database (AIID), OECD AI incidents, sector-specific breach reports	Incident discovery, cross-reference (not used as primary evidence)
Investigative journalism	Major news organisations, specialist technology and security press	Narrative context, initial incident signals
Civil society and watchdog reports	NGO investigations, digital rights organisations, whistleblower disclosures	Underreported harms, structural patterns
Institutional frameworks	MITRE ATLAS (opens in new tab), NIST AI RMF, EU AI Act annexes, OWASP LLM Top 10 (opens in new tab)	Comparative taxonomy validation, gap identification

Incident databases such as AIID are used for discovery and cross-reference only — they do not constitute primary evidence for classification. For source trust levels applied to individual incidents, see the Source Hierarchy.

Domain and Pattern Structure

The 8-domain taxonomy was constructed using a harm-first classification model: domains are defined by the nature of harm caused, not by the AI technology involved. This approach allows the taxonomy to remain applicable as AI architectures and deployment modalities change over time.

The eight domains were derived by grouping observed and documented harm mechanisms into categories that are meaningfully distinct from one another and collectively sufficient to cover the current landscape of AI-enabled threats. Each domain contains 4–6 threat patterns (42 total) describing specific mechanisms through which harm in that domain occurs.

The domain structure was developed in alignment with the risk categorisation models used in four established frameworks: the NIST AI Risk Management Framework, the EU AI Act, ISO/IEC 42001, and the MIT AI Risk Repository. These frameworks informed the initial scoping of harm categories and the boundaries between domains, though the TopAIThreats taxonomy does not derive its structure from any one of them directly. Where domain boundaries diverge from framework categories, this reflects differences in purpose: governance frameworks are designed to assign regulatory obligations; this taxonomy is designed to describe harm mechanisms.

Pattern codes are assigned alphabetically by slug within each domain (PAT-XXX-001 through PAT-XXX-NNN) and are permanent — they are never reused or reassigned. The full domain and pattern listing is at Taxonomy.

Inclusion and Exclusion Criteria for Patterns

A threat pattern is added to the taxonomy only when it satisfies all four of the following criteria:

Criterion	Definition
Harm linkage	The pattern is associated with observable harm or a clearly evidenced mechanism through which harm occurs — not purely a capability description
Generalisability	The mechanism has been observed across more than one system, deployment, or context — it is not specific to a single product or incident
Distinguishability	The pattern can be meaningfully separated from other patterns within its domain based on its harm mechanism or attack vector
Documentary support	At least one credible source documents the pattern operating in a real-world or near-real-world context (lab demonstrations without deployment evidence qualify only at signal level)

The following are excluded:

Purely speculative risks with no documented precedent or credible evidence of a real deployment
Capability descriptions without harm linkage (e.g. a model can do X, with no evidence of harm from X)
Issues better classified as governance, policy, or compliance failures where AI is not a material factor
General software bugs, data quality issues, or system outages where AI plays no causal role

Where a pattern is borderline, it is held at signal stage with single-source evidence until corroboration is available. The distinction between a threat pattern and a one-off incident is fundamentally a question of generalisability: a single documented event is an incident; a repeatable mechanism observed across multiple events is a pattern.

How the Taxonomy Is Updated

Adding New Patterns

When new AI capabilities or deployment contexts produce harm mechanisms not covered by existing patterns, a new pattern may be proposed for evaluation. The evaluation process follows these steps:

Signal identification — One or more incidents or research findings suggest a harm mechanism that does not map cleanly to an existing pattern
Generalisability assessment — Evidence is gathered to assess whether the mechanism is repeatable and observable across multiple contexts, or specific to a single event
Distinctiveness check — The proposed pattern is compared against all 42 existing patterns to confirm it cannot be adequately represented as a sub-variant of an existing one
Domain assignment — The primary domain is determined by the dominant harm type, following the harm-first classification principle
Owner approval — Taxonomy additions require explicit owner approval before publication. No new pattern is published without this approval gate

Taxonomy changes are documented in the Changelog. The current version (3.0) and all previous frozen versions are available at Taxonomy.

Pattern Deprecation and Merging

Patterns are not deleted. If evidence emerges that two patterns are better understood as a single mechanism, they may be merged — with the retired pattern code preserved as an alias and documented in the update log. If a pattern no longer reflects current AI capabilities or documented harms (for example, if a technology becomes obsolete), it is marked as retired rather than removed, and its incidents remain classified under it for historical accuracy.

Pattern codes are permanent identifiers. Retired codes are never reused or reassigned.

How Framework Mappings Are Constructed

Each of the eight domains carries a framework_mapping linking it to the closest corresponding categories in three external governance frameworks: NIST AI RMF, the EU AI Act, and ISO/IEC 42001. Mapping involves linking each domain category to equivalent requirements or risk tiers within the framework. These mappings are constructed as follows:

Framework	Mapping Approach
NIST AI RMF	Each domain is mapped to the relevant NIST AI RMF functions (Govern, Map, Measure, Manage) and associated subcategories. The mapping identifies where incidents in the domain most commonly surface within the NIST risk management lifecycle.
EU AI Act	Domains are mapped to the EU AI Act's risk tier system (unacceptable, high-risk, limited, minimal) and the relevant high-risk use case groupings in Annex III. The mapping supports cross-referencing with regulatory obligations.
ISO/IEC 42001	Domains are mapped to relevant ISO/IEC 42001 clauses and control objectives. The mapping supports organisations implementing AI governance against the international standard.

These frameworks inform alignment decisions but do not determine domain boundaries. Where a domain corresponds to multiple framework categories, or where no clean mapping exists, this is noted explicitly in the domain's framework mapping section. The full mapping for each domain is displayed on its domain page.

The MIT AI Risk Repository (opens in new tab) is used as a comparative reference for taxonomy coverage validation — identifying whether known AI risk categories from the academic literature are represented in the TopAIThreats pattern set. It is not a classification source.

Limitations and Scope

The following limitations apply to the research methodology and should be considered when using the taxonomy for research or policy purposes:

English-language bias — Source monitoring is primarily conducted in English. Incidents reported only in other languages are likely underrepresented, particularly from East Asia, South Asia, and Latin America.
Visibility bias — Incidents that attract public reporting or regulatory attention are more likely to be identified than harms that occur privately, in closed systems, or in contexts where victims lack the means or incentive to report.
No probability estimation — The taxonomy does not estimate the likelihood of future incidents. Severity and likelihood-trend assessments are qualitative, not quantitative.
No policy prescription — Classification of a threat does not constitute a recommendation for how to regulate, mitigate, or respond to it.
Classification review — Incident classification and pattern assignments are reviewed by multiple contributors before publication, but do not employ formal inter-rater reliability scoring. Disagreements between the classification and cited sources should be submitted via the Contributing page.
Evolving taxonomy — The 8-domain, 42-pattern structure reflects the current state of AI-enabled threats. As AI capabilities develop, new harm mechanisms will emerge that may require new patterns or domain revisions.

These limitations are documented here intentionally. A taxonomy that acknowledges its constraints is more useful as a reference than one that does not.

← Back to Methodology · → Data Collection Methodology · → Taxonomy