Data Collection Methodology — TopAIThreats

Overview

This page documents how incident data is discovered, collected, verified, and maintained in the TopAIThreats database. The process combines automated source monitoring, manual editorial research, and ongoing watchlist review. Every incident that enters the database has passed a structured six-step verification process and received explicit owner approval — no content is published automatically.

The full incident verification standards, source hierarchy, and rating definitions are described on the Methodology page. This page focuses on how data enters and is maintained within that system.

How Incidents Are Discovered

Incidents enter the review pipeline through three channels: automated monitoring, manual research, and watchlist-triggered re-review of existing incidents. All three feed into the same verification and approval process.

Automated Monitoring

An automated discovery system monitors news sources, research publications, and community reporting on a daily basis. When an article or report matches the keyword criteria for one or more of the 42 threat patterns, it is flagged as a candidate and scored for relevance.

Flagged candidates are pre-filtered by a large language model that evaluates each item against the incident inclusion criteria and assigns one of four verdicts:

Verdict	Meaning	Next Step
Publish	Strong candidate — AI materially involved, harm or signal documented	Telegram alert to editorial team for review
Verify	Plausible but needs human assessment — borderline AI involvement or unclear harm	Telegram alert flagged for closer review
Reject	Does not meet inclusion criteria — AI incidental, no harm, speculative	Logged and discarded
Duplicate	Relates to an incident already in the database	Merged with existing incident record if new information is present

The LLM pre-filter is an efficiency tool for triage, not a classification system. All final decisions — including whether to publish, how to classify, and what failure stage to assign — are made by human reviewers.

Manual Research

Incidents are also identified through direct editorial research: targeted searches in response to emerging events, review of regulatory filings and enforcement actions, monitoring of academic publication channels, and tips or suggestions from contributors. Manual research is the primary channel for incidents from regulatory and legal sources, which are less likely to surface through keyword-based monitoring.

Watchlist Monitoring

Open incidents — those with a resolution status of open — are monitored on a recurring basis for new developments. When a news article or report relates to a known open incident, the editorial team is notified with a summary of the new information and a suggested update. Updates to existing incidents follow the same approval process as new incident publication.

Incident Inclusion Criteria

An incident qualifies for inclusion when two conditions are both satisfied:

AI is a material component — the AI system caused, enabled, or materially amplified the event. AI being present in the technology stack is not sufficient; it must have changed the nature, scale, or feasibility of the harm.
At least one of the following is documented:
- Real-world harm (financial, physical, psychological, privacy, reputational, or societal)
- A verified system failure with credible risk of harm (near-miss)
- A capability demonstration in a real or near-real-world context indicating a dangerous failure mode (signal)
- A structural threat pattern emerging across multiple incidents (systemic risk)

There is no minimum financial threshold for harm. A single instance of targeted harassment or a rights violation affecting one individual qualifies if it meets the AI materiality requirement and has credible documentation. Scale affects severity rating, not inclusion.

Each incident requires at least one source meeting the minimum tier requirements described in the Source Hierarchy. Incidents supported only by unverified or discovery-tier sources are not published.

Verification Process

Before any incident is published, it undergoes a six-step verification process carried out by multiple reviewers:

Source check — At least one source meets tier requirements, is accessible, and is independently verifiable. URL and access date are recorded.
Scope check — AI is confirmed to be materially involved. The incident is not purely speculative, theoretical, or a general software failure.
Harm check — Real-world harm is demonstrated, or a credible near-miss or signal is documented with evidence. The harm or risk is not inferred — it is described in the source material.
Rating assignment — Four ratings are assigned: Status (confirmed = primary source or two independent credible sources; alleged = single credible report; under investigation = active inquiry pending); Severity (critical = large-scale harm or critical infrastructure; high = significant harm or multiple victims; medium = confirmed but limited scope; low = proof-of-concept or minor impact); Evidence level (primary = direct official confirmation; corroborated = multiple independent sources; single-source = one credible report awaiting corroboration); and Failure stage (signal, near miss, harm, systemic risk). Full definitions are at Rating Definitions.
Classification — Primary domain and threat pattern are assigned based on the dominant harm mechanism. Secondary patterns from other domains may be added where an incident spans multiple mechanisms. Causal factors, assets, lifecycle stages, and contextual tags (sectors, regions, affected groups, exposure pathways) are applied. Source trust is evaluated against the five-tier Source Hierarchy — Tier 1 (official regulatory and legal) through Tier 5 (expert commentary); Tier 6 sources are never used as evidence.
Content review — Language is checked for neutrality and accuracy. Sources are cited with inline superscripts. No editorialising, speculation, or advocacy language is present.

For Confirmed status, at least one Tier 1 source or two or more independent Tier 2–3 sources are required. If only one credible source exists, the incident is published as Alleged and monitored for corroboration. Edge cases — incidents where AI materiality is unclear, harm is contested, or sources conflict — are held pending additional review rather than published at reduced confidence.

Final publication requires explicit owner approval. This is a non-negotiable governance rule: no incident enters the live database without it.

Keeping Data Current

The database is a living record. Incidents are updated — never silently edited or deleted — when new information becomes available. The following triggers initiate a re-review of an existing incident:

A new source confirms, expands, or contradicts the documented facts
The resolution status changes (e.g. a legal case concludes, an investigation closes)
The severity assessment changes as the full scale of harm becomes known
A regulatory action or legal outcome is published
The watchlist monitoring system flags a relevant new article

All changes are appended to the incident's update log with a date and description. Earlier versions of the classification are preserved in the update history. The update log is visible on every incident page.

Incidents that have not been updated for 60 days (open) or 90 days (resolved) are flagged for a staleness review to confirm the record remains accurate.

Known Data Gaps and Limitations

The following limitations affect the completeness and representativeness of the incident database. They are documented here as a transparency measure, not a disclaimer.

Language coverage — Source monitoring is conducted primarily in English. Incidents that were reported only in other languages are likely underrepresented. This affects coverage of events in East Asia, South Asia, Latin America, and the Middle East in particular.
Reporting bias — Incidents that attract media coverage, regulatory attention, or legal proceedings are more likely to enter the database than harms experienced privately or in contexts where reporting is unlikely (e.g. domestic settings, non-public institutional environments, countries with limited press freedom).
Corporate non-disclosure — Many AI-enabled harms are handled internally by organisations without public disclosure. The database reflects the publicly documented record, not the full universe of incidents.
Emerging harm mechanisms — Novel threat patterns take time to be recognised and documented. There is an inherent lag between a new harm mechanism appearing in the world and it being sufficiently evidenced to warrant a new pattern or incident entry.
Source availability over time — Source URLs referenced in incident records may become unavailable (link rot). Where this occurs, the citation is retained with the original access date and URL; the incident is not removed.

These limitations reflect the practical constraints of evidence-based documentation rather than choices about scope. Where coverage gaps are identified, they are noted within the relevant domain or pattern pages.

← Back to Methodology · → Research Methodology · → API & Data Access