Skip to main content
TopAIThreats home TOP AI THREATS

Threat Patterns

48 empirically grounded threat patterns across 8 domains. Each pattern describes a concrete mechanism through which AI systems cause or enable harm.

Hierarchy: Domain → Pattern → Incident

Domains: 8 | Patterns: 48

Machine-readable: /api/threats.json

Threats caused by AI systems that act independently, persist over time, or coordinate with other systems.

PAT-AGT-001 Agent-to-Agent Propagation

Harmful behaviors, errors, or malicious instructions that spread between interconnected AI agents, amplifying damage beyond the originating system.

high
PAT-AGT-002 Cascading Hallucinations

AI-generated false information that propagates through chains of AI systems, with each system treating the previous system's hallucinated output as authoritative input.

medium
PAT-AGT-003 Goal Drift

AI agents that gradually deviate from their intended objectives over time, pursuing emergent sub-goals or optimizing for proxy metrics that diverge from human intent.

high
PAT-AGT-004 Memory Poisoning

Attacks or failures that corrupt an AI agent's persistent memory, context, or learned preferences, causing it to act on false information or compromised instructions across sessions.

high
PAT-AGT-005 Multi-Agent Coordination Failures

Harmful outcomes arising when multiple AI agents interact in unexpected ways, creating emergent behaviors that none were individually designed to produce.

medium
PAT-AGT-007 Specification Gaming

AI agents that achieve their stated objective through unintended means — exploiting loopholes, ambiguities, or proxy metrics in their specification rather than pursuing the outcome the designer intended — a phenomenon formalized as Goodhart's Law applied to AI systems.

high
PAT-AGT-006 Tool Misuse & Privilege Escalation

AI agents that exceed their intended permissions, misuse available tools, or escalate their own privileges to accomplish goals beyond their authorized scope.

high

Threats arising from how humans rely on, defer to, or lose control over AI systems.

Threats that distort markets, labor conditions, or the distribution of economic power.

Threats that undermine the reliability, authenticity, or shared understanding of information.

Threats involving unauthorized inference, tracking, or monitoring of individuals or groups.

AI-enabled attacks that compromise the integrity, confidentiality, or availability of digital systems — through input manipulation, model exploitation, or automated offense.

PAT-SEC-001 Adversarial Evasion

Techniques that manipulate AI model inputs to cause incorrect outputs, bypassing detection systems or security controls.

high
PAT-SEC-008 AI Supply Chain Attack

Attacks that compromise AI systems by tampering with model weights, fine-tuning datasets, tool-server configurations, or software dependencies before deployment — embedding backdoors or vulnerabilities that propagate through the model distribution chain.

high
PAT-SEC-002 AI-Morphed Malware

Malicious software that uses AI to adapt, evade detection, or generate novel attack variants autonomously.

critical
PAT-SEC-009 AI-Powered Social Engineering

The use of generative AI — language models, voice cloning, and real-time deepfake video — to conduct social engineering attacks at unprecedented scale, personalization, and persuasive quality, targeting human trust to gain unauthorized access, credentials, or financial transfers.

high
PAT-SEC-003 Automated Vulnerability Discovery

AI systems that autonomously identify, analyze, and potentially exploit software and system vulnerabilities.

medium
PAT-SEC-004 Data Poisoning

Deliberate corruption of training data to introduce biases, backdoors, or vulnerabilities into AI models.

high
PAT-SEC-007 Jailbreak & Guardrail Bypass

Adversarial conversational techniques that manipulate LLMs into disabling or circumventing their safety constraints, producing outputs that alignment training was designed to prevent — from harmful content generation to policy-violating instructions.

high
PAT-SEC-005 Model Inversion & Data Extraction

Attacks that extract private training data or sensitive information from AI models through targeted queries or analysis.

high
PAT-SEC-006 Prompt Injection Attack

Adversarial inputs that override an AI system's intended instructions at runtime, causing it to execute attacker-controlled actions — from data exfiltration to unauthorized tool use — by exploiting the inability of LLMs to distinguish system instructions from user-supplied data.

high

Threats that result in unfair treatment, exclusion, or social harm to individuals or groups.

Threats that emerge from scale, coupling, and accumulation rather than single failures.

8 domains · 48 patterns · View full taxonomy · View domains