Methodology — TopAIThreats Classification Framework

This page describes how the TopAIThreats taxonomy is constructed, validated, and maintained. It explains how threat patterns are identified, how domains are defined, how evidence and severity judgments are applied, and how the full taxonomy structure — including causal factors, harm types, assets, lifecycle stages, and governance framework mappings — supports consistent classification of AI-enabled threats.

The methodology is designed to prioritise clarity, reproducibility, and real-world relevance, rather than speculative or purely technical risk modelling.

Detailed Methodology

Research Methodology — How threat patterns are identified, inclusion criteria, taxonomy governance, and framework mappings

Data Collection Methodology — Discovery channels, verification process, update triggers, and known data gaps

What Is an AI Threat?

An AI threat is any real-world harm or credible risk of harm that is materially enabled, amplified, or caused by an artificial intelligence system.

This definition requires that AI be a necessary component of the threat mechanism — not merely present in the technology stack. A cyberattack that happens to use an AI-powered tool qualifies only if the AI capability materially changes the nature, scale, or feasibility of the attack. General software failures, human errors, and policy disputes without an AI component are excluded.

What Qualifies as an Incident?

An incident is a documented event in which an AI system caused, contributed to, or demonstrated credible potential for real-world harm to individuals, organisations, or society.

An incident is considered in scope when two conditions are met:

AI is a material component of the incident (not merely "digital" or "cyber" — AI must have caused, enabled, or amplified the event)
One of the following is documented:
- Real-world harm — financial loss, privacy breach, physical harm, discrimination, reputational or psychological damage
- A verified system failure with credible risk of harm (near-miss)
- A capability demonstration indicating a dangerous failure mode (early warning signal)
- A structural threat pattern emerging across multiple incidents (systemic risk)

Incidents that are purely theoretical, unverified, or lack credible sourcing are excluded. Signal incidents must demonstrate a clear dangerous capability or a system failure in a real-world deployment — speculative blog posts and theoretical concerns without evidence are not sufficient. Each incident requires at least one verifiable source meeting the source hierarchy standards below.

Failure Stages

Every incident is assigned a failure stage reflecting its position in the threat progression model:

Signal: Early demonstration of a dangerous capability. The AI system showed it could cause harm, even if no harm occurred. Requires primary or corroborated evidence.
Near Miss: A system failure occurred but harm was avoided or limited. The incident demonstrates a credible risk that could escalate.
Harm: Real-world damage occurred — financial, physical, reputational, psychological, or to rights and freedoms.
Systemic Risk: Multiple incidents demonstrate a structural threat pattern. The risk is not isolated but represents an emerging systemic concern.

Severity ratings (critical, high, medium, low) apply independently to all failure stages. A signal can be rated high severity if the demonstrated capability is particularly dangerous; a harm incident can be rated low if the impact was minor.

Purpose of the Methodology

The goal of this methodology is to ensure that the TopAIThreats classification framework:

Classifies AI-enabled risks consistently across contexts
Focuses on observable harm, verified failures, and demonstrated capabilities, not implementation details
Remains grounded in documented real-world events across all failure stages
Can evolve as new threat patterns emerge
Provides a shared vocabulary usable by researchers, regulators, and AI systems alike

This methodology governs inclusion, classification, and maintenance, but does not prescribe policy or mitigation strategies.

Core Classification Principle

Threats are classified by the nature of harm they cause, not by the technology used to cause them.

Impact-First, Not Technology-First

Threats are classified based on observable impact and harm, rather than:

Model architecture
Training technique
Deployment modality
AI capability class

This ensures that the taxonomy remains applicable across current and future AI systems, even as underlying technologies change.

The 8-Domain Taxonomy

The taxonomy organises all AI-enabled threats into eight domains based on the primary category of harm, with 42 threat patterns describing specific mechanisms through which harm occurs.

The eight domains are:

Information Integrity (DOM-INF) — Threats to the reliability and authenticity of information, including deepfakes, disinformation, and synthetic media
Security & Cyber (DOM-SEC) — AI-enhanced cyberattacks, adversarial evasion, automated vulnerability exploitation, and AI-morphed malware
Privacy & Surveillance (DOM-PRI) — Mass surveillance, biometric tracking, inference attacks, and erosion of informational self-determination
Discrimination & Social Harm (DOM-SOC) — Algorithmic bias, discriminatory outcomes, and AI systems that reinforce or amplify social inequities
Economic & Labor (DOM-ECO) — Workforce displacement, market manipulation, economic concentration, and labour exploitation enabled by AI
Human-AI Control (DOM-CTL) — Loss of meaningful human oversight, autonomy erosion, and failures of alignment between AI behaviour and human intent
Agentic Systems (DOM-AGT) — Risks from AI systems that act with increasing independence, including autonomous weapons, self-modifying systems, and agentic tool use
Systemic Risk (DOM-SYS) — Large-scale risks including critical infrastructure failure, existential risk, and cascading systemic effects

The full classification structure with all 42 threat patterns is available on the Taxonomy page.

Domain Structure

The framework consists of:

8 high-level domains, each representing a distinct category of harm
4–6 threat patterns per domain (42 total), describing concrete mechanisms through which harm occurs

Each threat pattern is designed to be:

Mutually distinguishable from other patterns within its domain
Operationally meaningful — linked to observable behaviours and outcomes
Observable in real-world incidents — not purely theoretical constructs

Domains are not intended to be strictly exclusive. Where appropriate, cross-domain intersections are explicitly documented, while maintaining a primary classification for consistency. The full classification structure is available on the Taxonomy page.

Taxonomy Dimensions

Beyond the domain–pattern hierarchy, the taxonomy includes additional dimensions that provide context for each incident and pattern.

Causal Factors

Causal factors describe why a threat occurred. The taxonomy identifies 15 causal factors grouped into four categories: malicious misuse, design and development failures, deployment and integration failures, and systemic-organisational failures. Each incident and pattern is tagged with one or more causal factors to support root-cause analysis.

See the full listing at Causal Factors.

Harm Types

Harm types describe what kind of damage results from a threat. The taxonomy identifies seven harm types: physical, financial, privacy, discrimination, reputational, psychological, and systemic. These complement the domain classification by capturing the nature of the consequence rather than the mechanism of the threat.

See the full listing at Harm Types.

Assets and Technologies

Assets describe what is targeted or involved in a threat. The taxonomy identifies 12 asset types grouped into five categories: data assets, model assets, system assets, infrastructure assets, and process assets. Each incident is tagged with the assets relevant to its threat mechanism.

See the full listing at Assets & Technologies.

Attack Lifecycle

The attack lifecycle describes when in the threat lifecycle each incident or pattern occurs. Six ordered stages are defined: reconnaissance, weaponisation, delivery, exploitation, persistence, and detection and response. These stages follow the conventional cyber kill chain model adapted for AI-specific threats.

See the full listing at Attack Lifecycle.

Governance Frameworks

Each domain carries a mapping to three external governance frameworks: the NIST AI Risk Management Framework, the EU AI Act, and ISO/IEC 42001. These mappings identify the closest corresponding risk categories in each framework, enabling cross-referencing between the TopAIThreats taxonomy and established regulatory and standards bodies.

See the full listing at Governance Frameworks.

Risk Assessment

For critical and high-severity incidents, the taxonomy includes a structured risk assessment capturing two dimensions: impact scope (individual, organisational, sectoral, or societal) and reversibility (reversible, partially reversible, or irreversible). This provides additional context for severity judgments beyond the four-level ordinal scale.

Cross-Domain Classification

Many AI-enabled threats span multiple domains. The taxonomy handles this through a primary–secondary classification model.

Each incident is assigned:

One primary pattern — the dominant harm mechanism, which determines the incident's domain classification
Zero or more secondary patterns — contributing or enabling mechanisms from any domain

The primary classification reflects the most significant harm observed or the mechanism most directly responsible for the outcome. Secondary patterns capture additional dimensions of the threat without diluting the primary classification.

For example, a deepfake-enabled financial fraud may have a primary classification under Information Integrity (the deepfake mechanism) and a secondary classification under Security & Cyber (the social engineering attack vector). The primary determines which domain page the incident appears under; the secondary enables cross-domain analysis.

Domain and pattern pages document known cross-domain interactions explicitly. The full structure is available on the Taxonomy page.

Identification Scheme

All entities in the taxonomy use permanent, structured identifiers that are never reused, reassigned, or deleted.

Entity	Format	Example	Rule
Incident	`INC-YY-NNNN`	INC-24-0001	YY = 2-digit year, NNNN = per-year sequence
Domain	`DOM-XXX`	DOM-INF	3-letter abbreviation
Threat Pattern	`PAT-XXX-NNN`	PAT-INF-002	Assigned alphabetically by slug within domain
Causal Factor	`CAUSE-NNN`	CAUSE-001	Sequential
Harm Type	`HARM-NNN`	HARM-001	Sequential
Asset	`ASST-NNN`	ASST-001	Sequential
Lifecycle Stage	`LIFE-NNN`	LIFE-001	Sequential by stage order
Framework	`FRMW-NNN`	FRMW-001	Sequential

Identifiers are permanent. Once assigned, an ID is never reused even if the associated entity is retired or superseded. Pattern codes (PAT-XXX-NNN) are assigned alphabetically by slug within each domain — new patterns receive the next available number. This ensures stable references for external citations and machine-readable integrations.

Threat Pattern Identification

Threat patterns are identified through systematic review of:

Academic research and technical literature
Regulatory filings and enforcement actions
Incident databases and breach disclosures
Investigative journalism and primary reporting
Civil society and watchdog organisation reports

Patterns are included only when they represent a repeatable or generalisable mechanism of harm, rather than a single isolated anomaly. Each pattern must be distinguishable from other patterns in the same domain and linked to at least one documented incident or a well-evidenced risk mechanism.

Evidence Assessment

Evidence is assessed on a three-tier scale — primary, corroborated, or single-source — based on the independence and authority of confirming sources.

Each threat pattern is evaluated against an evidence threshold prior to inclusion. Evidence may include:

Verified real-world incidents with identified victims and outcomes
Legal or regulatory findings (court rulings, enforcement actions, audit reports)
Multiple independent reports describing the same mechanism
Longitudinal patterns observed across deployments or sectors

Where evidence is emerging or incomplete, this is explicitly noted. Speculative or purely hypothetical risks are excluded from the taxonomy.

Severity Rating Scale

Severity measures the magnitude, reversibility, and breadth of harm caused by an AI-enabled threat, rated on a four-level ordinal scale from critical to low.

Threat patterns are assessed along two independent dimensions:

Severity

Severity reflects the potential magnitude of harm if the threat occurs, considering:

Scale of impact — number of individuals, organisations, or systems affected
Reversibility of harm — whether damage can be undone or mitigated after the fact
Degree of damage — human, economic, or societal consequences

Severity is expressed as an ordinal category (critical → low), not a numerical score.

Likelihood Trend

Likelihood reflects the observed or inferred trend, not a precise probability. Indicators include:

Increasing frequency of incidents
Lower barriers to execution
Broader accessibility of enabling tools

Severity and likelihood assessments are periodically reviewed as new evidence becomes available.

Inclusion and Exclusion Criteria

A threat pattern is included only if it meets all of the following:

Demonstrates observable harm or credible harm mechanisms
Is generalisable beyond a single system or deployment
Can be meaningfully distinguished from other patterns
Is supported by documented evidence

The following are excluded:

Purely speculative risks without documented precedent or credible mechanism
Capability descriptions without harm linkage
Issues better classified as governance or policy failures alone
General software bugs or system outages where AI is not a material factor

Reference Frameworks and Alignment

The taxonomy is informed by, but not derived from, existing AI risk classification efforts and regulatory frameworks. Each domain carries a framework_mapping that links it to the closest corresponding categories in three external frameworks.

MIT AI Risk Repository

The MIT AI Risk Repository (opens in new tab) provides a comprehensive catalogue of AI risk factors drawn from academic and policy literature. TopAIThreats uses it as a comparative reference for threat pattern coverage and to identify categorisation gaps. Each domain maps to one or more MIT risk categories.

EU AI Act

The EU AI Act establishes a risk-based regulatory framework for AI systems deployed in the European Union. TopAIThreats aligns domain definitions with the EU AI Act's risk categories (unacceptable, high-risk, limited, minimal) and high-risk use case groupings to support regulatory cross-referencing.

NIST AI Risk Management Framework

The NIST AI RMF provides a voluntary framework for managing risks associated with AI systems. Each domain maps to relevant NIST AI RMF functions (Govern, Map, Measure, Manage) and associated risk categories, enabling organisations that follow NIST guidance to map their risk assessments against the TopAIThreats taxonomy.

ISO/IEC 42001

ISO/IEC 42001 is the international standard for AI management systems. TopAIThreats maps each domain to relevant ISO/IEC 42001 clauses and control objectives, supporting organisations pursuing certification or implementing AI governance aligned with international standards.

These frameworks inform validation and alignment decisions but do not determine the structure or boundaries of domains. The full framework mapping for each domain is displayed on its detail page. All three frameworks have dedicated pages at Governance Frameworks.

Incident Verification Process

Before any incident is published, it undergoes a six-step verification process:

Source check — At least one source meets tier requirements and is accessible and verifiable
Scope check — AI is materially involved and the incident is not purely speculative
Harm check — Real-world harm is demonstrated or credible risk with evidence exists
Rating assignment — Status, Severity, and Evidence Level are assigned per the definitions below
Classification — Primary domain and threat pattern assigned; secondary patterns, causal factors, assets, lifecycle stages, and contextual tags (sectors, regions, affected groups) applied
Content review — Language is checked for neutrality, sources are cited with superscripts, and no editorialising is present

Source Hierarchy

Sources are evaluated according to a five-tier hierarchy plus a discovery-only category. Tier 6 sources (unverified) are never used as evidence.

Tier	Source Type	Examples	Trust Level
1	Primary	Courts, legal filings, government regulators (FTC, DOJ, Europol), law enforcement press releases, official victim organisation statements	Highest
2	Authoritative institutional	OECD, NIST, WEF, Big 4 audit firms, academic research institutions, major think tanks	High
3	Major news organisations	Reuters, BBC, Bloomberg, Wall Street Journal, New York Times	Medium-High
4	Industry publications	Trade press, sector-specific journals	Medium
5	Expert commentary	Named expert analysis, conference presentations	Context only
—	Discovery only (never primary evidence)	AI Incident Database, blogs, vendor reports without independent confirmation, social media	Not used

To classify an incident as Confirmed, at least one Tier 1 source or two or more independent Tier 2–3 sources are required. If only one credible source exists, the incident status is set to Alleged pending corroboration.

Incident Rating Definitions

Status

Confirmed — Primary source verification or multiple independent credible sources
Alleged — Single credible report; monitoring for corroboration
Under Investigation — Active investigation by authorities or the organisation involved; outcome pending

Severity

Critical — Large-scale harm (>$1M aggregate), critical infrastructure involvement, or active ongoing campaigns affecting many victims
High — Significant harm, multiple victims, or targeting vulnerable populations
Medium — Confirmed harm but limited in scope or duration
Low — Proof-of-concept, minor impact, or harm limited to a single instance

Evidence Level

Primary — Direct official confirmation (courts, regulators, victim statements)
Corroborated — Multiple independent credible sources
Single-source — One credible report, awaiting corroboration

Resolution Status

Open — Incident is ongoing, under investigation, or not yet resolved
Resolved — Incident has been addressed, remediated, or concluded

Update Policy

Incidents are updated when new sources confirm or expand the event, status or severity changes, new outcomes emerge, or timeline events occur. All changes are recorded in the Update Log on each incident page. No change is made silently.

The following actions are not taken: incidents are not deleted, past content is not silently edited, incident IDs are not changed, and earlier interpretations are not rewritten. If an incident is reclassified, the original classification is preserved in the update history.

For the taxonomy itself, new threat patterns may be added as evidence emerges, existing classifications may be refined or merged, and severity and likelihood assessments may be updated. Significant changes are documented in the Changelog.

Editorial Process

Every incident undergoes a multi-step verification process and requires explicit owner approval before publication — no content is auto-published.

Incidents are identified through three channels:

Automated discovery — Daily RSS feed scans across 11 sources, 2 HTML scrapers, and Reddit monitoring, filtered against 156 keywords covering all 42 threat patterns
Manual submission — Incidents identified through direct research, reader tips, or editorial review
Watchlist monitoring — Open incidents are monitored three times per week for new developments

High and medium-priority candidates trigger a Telegram notification for editorial review. No incident is published without explicit approval. This governance rule is non-negotiable.

What TopAIThreats Is NOT

This site is a classification and reference system, not a news outlet, advocacy platform, or speculative risk forecaster.

TopAIThreats does not:

Estimate precise probabilities or predict future AI capabilities
Rank organisations, products, or AI systems
Advocate for specific policies or regulatory positions
Publish speculative, hypothetical, or unverified content
Provide legal, compliance, or professional security advice
Replace domain-specific safety assessments or audits

Its purpose is to provide a shared analytical structure for understanding AI-enabled threats across disciplines and contexts.

Transparency

Methodological assumptions, evidence limitations, and classification uncertainties are intentionally surfaced wherever relevant. This is done to support critical evaluation, reuse, and adaptation by researchers, policymakers, and practitioners.

The taxonomy and all incident data are available in machine-readable formats through the API & Data Access hub, including JSON endpoints for the taxonomy structure, incident data, and a knowledge graph. An LLM-optimised text endpoint and RSS feed are also provided.

How to Cite

The TopAIThreats methodology and taxonomy are designed to be cited as a reference framework. Use the following formats for academic and professional citation.

Suggested Citation (APA)

TopAIThreats.com. (2026). Methodology — TopAIThreats Classification Framework (Taxonomy v3.0). Retrieved 2026-03-04, from https://topaithreats.com/methodology/

BibTeX

@misc{topaithreats2026methodology,
  title   = {Methodology -- TopAIThreats Classification Framework},
  author  = {{TopAIThreats.com}},
  year    = {2026},
  url     = {https://topaithreats.com/methodology/},
  note    = {Taxonomy version 3.0, last updated 2026-03-04}
}

Citing Individual Incidents

Each incident page includes a "How to Cite" section with a pre-formatted citation including the incident ID, title, and direct URL. For programmatic access to citation data, see the Knowledge Graph & Citations API.

Version Note

When citing, include the access date, as the taxonomy may be updated. The current taxonomy structure is documented at /taxonomy/.

Example

To see this methodology applied in practice, see INC-24-0001: Hong Kong Deepfake CFO Video Conference Fraud — a confirmed, critical-severity incident with primary-source evidence, classified under Information Integrity (DOM-INF) with the threat pattern Deepfake Identity Hijacking (PAT-INF-002).

The incident demonstrates the full classification model: a primary pattern, secondary patterns from other domains, causal factor tagging (CAUSE-001: Intentional Misuse), harm type (HARM-002: Financial), asset identification (ASST-002: Generative Model), and lifecycle stage mapping.

Use in Retrieval

This page documents the complete classification methodology for the TopAIThreats taxonomy (version 3.0), an evidence-based framework for cataloguing AI-enabled threats across eight domains and 42 threat patterns. The methodology covers incident verification standards, source hierarchy, severity rating scales, cross-domain classification rules, identification schemes, and alignment with three external governance frameworks (NIST AI RMF, EU AI Act, ISO/IEC 42001). It is maintained at topaithreats.com/methodology and is designed to be cited as a reference framework by researchers, regulators, and AI systems.

Detailed Methodology

Research Methodology — How threat patterns are identified, inclusion criteria, taxonomy governance, and framework mappings
Data Collection Methodology — Discovery channels, verification process, update triggers, and known data gaps

→ About TopAIThreats · → Taxonomy · → API & Data Access · → Contributing