Security & Cyber Threats
AI-enabled attacks that compromise the integrity, confidentiality, or availability of digital systems — through input manipulation, model exploitation, or automated offense.
Domain Details
- Domain Code
- DOM-SEC
- Threat Patterns
- 9
- Documented Incidents
- 16
- Framework Mapping
- MIT (Privacy & Security) · EU AI Act (Cybersecurity & Robustness)
Last updated: 2026-02-28
Incident Data Snapshot
Total incidents
High or Critical
Resolved
Adversarial Evasion
Security & Cyber Threats represent the most operationally immediate AI risk category. They transform AI systems into attack surfaces, attack tools, or both simultaneously. While individual exploits may appear localized, the escalation pathways documented below demonstrate clear potential for sector-wide and systemic consequences. This domain is structurally defined by boundary failures — where natural language, model behavior, and permission architecture intersect.
Definition
AI cybersecurity threats are attacks where artificial intelligence is either the target, the weapon, or both. They include prompt injection, model data extraction, AI-generated malware, and autonomous exploitation — and they are accelerating faster than traditional defenses can adapt.
Security & Cyber Threats are AI-enabled attacks that compromise the integrity, confidentiality, or availability of digital systems. These threats manipulate model inputs, exploit trained model behavior, or automate vulnerability discovery to achieve unauthorized access, data extraction, system control, or infrastructure disruption.
This domain captures incidents where AI is materially involved in:
- Bypassing technical safeguards
- Extracting sensitive data
- Escalating privileges
- Automating exploitation
- Enhancing cyber offense capabilities
Why This Domain Is Distinct
Security & Cyber Threats differ from traditional cybersecurity because:
- AI systems become both target and tool — the same model can be exploited and weaponized in a single incident
- Natural language becomes an attack surface — conversational inputs bypass traditional input validation
- Exploitation speed exceeds human response cycles — AI compresses attack timelines from weeks to hours
- Escalation compresses vertically — individual exploits chain to sector-wide exposure through shared vulnerability classes
This domain analysis covers the nine classified threat patterns, their operational mechanisms, causal factor clustering, cross-domain risk pathways, escalation dynamics, and the aggregate incident record for AI-enabled cyber threats.
Threat Patterns in This Domain
This domain contains nine classified threat patterns organized into two groups by attack target — patterns that exploit trained models directly, and patterns that compromise systems through model inputs, supply chains, or social vectors.
Model Attacks (2)
These patterns target trained models as the primary attack surface — exploiting what the model learned or what it stores.
- Data Poisoning corrupts model behavior at the training stage by injecting malicious data into the training pipeline before deployment.
- Model Inversion & Data Extraction (aka inference attacks) reverses the model’s learned representations — using targeted API queries to reconstruct private training data, membership records, or model weights. Also covers model extraction via distillation and query-based weight reconstruction.
System Attacks (7)
These patterns exploit AI systems at runtime — manipulating inputs, bypassing guardrails, compromising deployment infrastructure, or using AI to enhance traditional offense.
- Adversarial Evasion — subtle input perturbations that cause misclassification or bypass detection at inference time.
- AI-Morphed Malware — generative AI used to mutate or generate malicious code that evades signature-based defenses.
- Automated Vulnerability Discovery — AI-accelerated scanning, exploit generation, and multi-stage attack chaining without human direction.
- Prompt Injection Attack (forthcoming) — adversarial instructions embedded in user inputs that override system prompts or hijack model behavior; escalates to tool misuse in agentic deployments. Cross-domain secondary: Agentic & Autonomous.
- Jailbreak & Guardrail Bypass (forthcoming) — conversational techniques that circumvent safety filters and alignment constraints. Cross-domain secondary: Human-AI Control.
- AI Supply Chain Attack (forthcoming) — compromise of model packages, fine-tuning datasets, or tool-server configurations that propagate through dependency chains before deployment.
- AI-Powered Social Engineering (forthcoming) — AI-enhanced phishing, vishing, and spear-phishing workflows that use generative content (deepfake voice, personalized lures) as the attack instrument to gain system access or credentials. Distinct from Deepfake Identity Hijacking (PAT-INF-002), which is an information integrity pattern focused on the synthetic media artifact; this pattern is defined by the access outcome. Cross-domain secondary: Information Integrity.
These patterns frequently co-occur in documented incidents. A prompt injection attack against an AI assistant may extract proprietary data (model inversion), which then informs automated exploitation of discovered vulnerabilities — a chain documented in multiple incidents.
This chaining behavior is characteristic of the Security & Cyber domain and distinguishes it from domains where threats tend to operate independently. Pattern definitions, severity assessments, and incident links are provided in the Pattern Cards below.
How These Threats Operate
Beyond classification, understanding the operational mechanisms reveals why these threats require different defensive approaches than traditional cybersecurity risks. Security & Cyber incidents cluster around three primary mechanisms.
1. Input Manipulation
Attackers craft adversarial prompts or inputs that override model safeguards, including:
- Prompt injection — adversarial instructions embedded in user-facing inputs
- Jailbreak techniques — bypassing safety filters through conversational manipulation
- Adversarial perturbations — subtle input modifications that cause misclassification
- Context window manipulation — overriding system instructions through input positioning
In AI assistants and agentic systems (AI applications that autonomously invoke tools and APIs), natural language becomes an attack surface. Unlike traditional software inputs, language instructions are inherently ambiguous and difficult to sandbox deterministically.
Incidents such as INC-25-0007 (GitHub Copilot RCE, CVE-2025-53773) and INC-25-0004 (M365 Copilot EchoLeak, CVE-2025-32711) demonstrate that crafted natural language inputs can achieve remote code execution in agentic environments.
2. Model Exploitation
Rather than manipulating inputs, attackers target the trained model itself through:
- Model inversion — reconstruction of training data from model outputs
- Memorization extraction — retrieving verbatim training data through targeted queries
- Parameter inference — reconstructing model architecture and weights
- Training data leakage — exploiting insufficient data sanitization
Large language models can reproduce fragments of training data — including credentials, API keys, or proprietary code — under specific querying conditions. The GitHub Copilot training data leak demonstrated verbatim reproduction of code from the training corpus.
3. Autonomous Offense
AI systems automate discovery, analysis, and exploitation, enabling:
- Rapid vulnerability scanning at machine speed
- Exploit generation from vulnerability descriptions
- Multi-stage attack chaining without human direction
- Adaptive evasion of detection systems
Supply chain attacks represent an emerging vector in this mechanism: compromised model packages, poisoned fine-tuning datasets, or malicious tool-server configurations can propagate through dependency chains before detection occurs.
When AI reduces exploitation time from weeks to hours, traditional defensive cycles become structurally disadvantaged. The GTG-1002 cyber espionage campaign demonstrated state-level actors using AI to orchestrate multi-stage intrusions at machine speed.
Common Causal Factors
The three operational mechanisms — input manipulation, model exploitation, and autonomous offense — share identifiable root causes. Analysis of documented incidents in this domain reveals five dominant enabling conditions, concentrated in two clusters.
Cluster 1 — Permission and Input Failures:
- Prompt Injection Vulnerability and Inadequate Access Controls are the most frequently co-occurring factors. Natural language interfaces that cannot distinguish system instructions from adversarial inputs become especially dangerous when AI systems are deployed with excessive permissions or weak isolation boundaries.
- Neither factor alone reliably produces critical outcomes — it is their combination that enables the most severe exploits.
Cluster 2 — Technique and Configuration Failures:
- Adversarial Manipulation Techniques appear across multiple patterns — deliberate perturbation of inputs or training data to induce model misbehavior. This is the broadest causal factor, underlying adversarial evasion, data poisoning, and model extraction attacks.
- Weaponization of Model Capabilities reflects the deliberate adaptation of generative models for offensive purposes — phishing, exploit writing, or intrusion support. This factor distinguishes intentional misuse from accidental exposure.
- Misconfigured Deployment captures integration-level security gaps introduced during implementation — insufficient hardening, default configurations left unchanged, or inadequate separation between AI components and sensitive systems.
Compared with other domains, these causal factors are distinctive. Unlike Information Integrity threats, which cluster around authentication failures and incentive structures, Security & Cyber Threats concentrate on permission design, input validation, and boundary enforcement failures. This domain is structurally more dependent on technical architecture decisions than on content ecosystem dynamics.
What the Incident Data Reveals
These causal patterns are reflected in the aggregate incident record. Security & Cyber Threats represent the most heavily documented domain in the registry, and the data reveals several structural patterns.
Severity and Temporal Patterns
Fifteen of sixteen documented incidents are rated high or critical severity — the highest concentration of any domain in the registry. This distribution reflects the direct, measurable impact of security compromises — unauthorized access, data exfiltration, and system control produce concrete and quantifiable harms.
Incident volume has accelerated year over year, with the most recent reporting period accounting for the largest share of documented cases. The most severe incidents — including those involving agentic AI exploitation (AI systems with autonomous tool-use capabilities being compromised) and state-sponsored campaigns — are disproportionately recent.
This temporal pattern indicates that the threat surface is expanding rather than stabilizing, driven by broader AI deployment and the emergence of agentic architectures.
Resolution and Pattern Dynamics
Eleven of sixteen documented incidents have been resolved through patches, policy changes, or vendor remediation. Five incidents remain structurally open:
- INC-24-0007 — indirect prompt injection as a persistent vulnerability class
- INC-23-0014 — Copilot training data leak, pending litigation
These cases represent vulnerability-class persistence rather than discrete remediation failure, suggesting that certain Security & Cyber threats resist point-in-time resolution.
At the pattern level, Adversarial Evasion accounts for 8 of 16 documented incidents, with prompt injection (adversarial instructions embedded in user-facing inputs) as the dominant sub-technique. Data Poisoning, while assessed as high severity with increasing likelihood, has no confirmed documented cases in the registry — a gap that likely reflects detection difficulty rather than absence of occurrence.
Cross-Domain Interactions
Security & Cyber Threats rarely operate in isolation. The incident record documents specific pathways through which security compromises chain into other threat domains. This domain interacts with all seven other domains in the taxonomy — five with documented incident evidence and two with plausible but unconfirmed pathways.
Security & Cyber → Information Integrity. Compromised AI systems generate outputs that inherit false legitimacy. When a model is exploited, its text, code, or media carries the provenance of the trusted system — enabling social engineering at scale. The Hong Kong deepfake CFO fraud combined security-domain synthetic identity generation with information-integrity social engineering.
Security & Cyber → Privacy & Surveillance. Data extraction attacks directly produce privacy violations. The Samsung ChatGPT data leak began as a security boundary failure and resulted in irrecoverable exposure of trade secrets. Model inversion (training data reconstruction from model outputs) techniques formalize this pathway.
Security & Cyber → Agentic & Autonomous. As AI systems gain tool-use capabilities, security exploits acquire force multiplication. The Cursor IDE vulnerabilities (CVE-2025-54135, CVE-2025-54136) demonstrated that prompt injection (crafted adversarial instructions) in an agentic coding environment achieves arbitrary code execution.
Security & Cyber → Economic & Labor. AI-powered cyber weapons lower the cost and skill threshold for offensive operations. WormGPT provided criminal actors with professional-grade phishing capabilities, shifting the economics of cybercrime.
Security & Cyber → Systemic & Catastrophic. When automated vulnerability discovery is paired with autonomous exploitation, the speed of attack exceeds human response capacity. The GTG-1002 campaign demonstrated state-sponsored AI cyber operations against critical infrastructure.
Security & Cyber → Discrimination & Social Harm. Adversarial techniques can selectively degrade model performance for specific demographic groups — for example, crafted inputs that cause a hiring model to misclassify candidates from particular backgrounds. While no incident in the registry documents this specific chain, the underlying mechanism (targeted adversarial perturbation) is well-established in the Adversarial Evasion pattern.
Security & Cyber → Human-AI Control. Security exploits targeting AI monitoring or oversight systems can create blind spots in human supervision — for example, compromising an AI-based anomaly detector to suppress alerts during an ongoing attack. The mechanism parallels documented prompt injection attacks but targets oversight infrastructure rather than end-user applications.
Formal Interaction Matrix
| From Domain | To Domain | Interaction Type | Mechanism |
|---|---|---|---|
| Security & Cyber | Information Integrity | AMPLIFIES | Exploited models produce trusted-seeming disinformation |
| Security & Cyber | Privacy & Surveillance | EXTRACTS FROM | Model inversion reconstructs training data; breaches expose user data |
| Security & Cyber | Agentic & Autonomous | FORCE-MULTIPLIES | Prompt injection in tool-using agents → arbitrary code execution |
| Security & Cyber | Economic & Labor | CASCADES INTO | AI-generated phishing and exploit tools lower cybercrime cost threshold |
| Security & Cyber | Systemic & Catastrophic | CASCADES INTO | Autonomous multi-stage exploitation targets critical infrastructure |
| Security & Cyber | Discrimination & Social Harm | PLAUSIBLE | Adversarial attacks on fairness models → demographic performance degradation |
| Security & Cyber | Human-AI Control | PLAUSIBLE | Exploitation of AI oversight systems → blind spots in monitoring |
Escalation Pathways
Beyond cross-domain interaction, Security & Cyber Threats follow characteristic escalation pathways within their own domain. At each stage, AI capabilities compress the timeline and reduce the skill threshold required to reach the next level.
Escalation Overview
| Stage | Level | Example Mechanism |
|---|---|---|
| 1 | Individual Exploit | Prompt injection in one employee’s AI assistant |
| 2 | Organizational Breach | Lateral movement via harvested credentials or code execution |
| 3 | Sector-wide Vulnerability | Shared vulnerability class across enterprises using the same AI product |
| 4 | Infrastructure-scale Disruption | Automated exploitation across critical infrastructure systems |
Stage 1 — Individual Exploit
A single successful prompt injection (adversarial instructions injected into a model’s input) against an employee’s AI assistant yields credentials, internal documents, or code repositories.
The EchoLeak vulnerability (INC-25-0004, CVE-2025-32711) demonstrated zero-click data exfiltration from individual M365 Copilot sessions — requiring no user interaction to initiate compromise. At this stage, the blast radius is limited to one user’s access scope.
Stage 2 — Organizational Breach
When an individual exploit provides lateral access — through harvested credentials, elevated permissions, or code execution — the compromise extends to organizational systems.
The Cursor IDE MCP vulnerabilities (INC-25-0008, CVE-2025-54135/54136) showed how prompt injection in a developer’s AI tool could achieve arbitrary code execution on corporate infrastructure, converting a single-user exploit into an enterprise-level breach.
Stage 3 — Sector-wide Vulnerability
When the same vulnerability class affects all deployments of a widely adopted AI product, organizational breach becomes sector-wide exposure.
Indirect prompt injection (INC-24-0007) affects every enterprise deploying retrieval-augmented generation (RAG) systems — applications that retrieve external documents to inform AI responses — creating simultaneous exposure across finance, healthcare, government, and other sectors that share the same underlying technology.
Stage 4 — Infrastructure-scale Disruption
When autonomous AI systems chain exploits without human direction, the speed of attack exceeds human response capacity.
The GTG-1002 campaign (INC-25-0001) demonstrated state-sponsored AI orchestrating multi-stage intrusions against critical infrastructure, compressing attack timelines from weeks to hours and creating conditions for cascading failures across interdependent systems.
Compound Threat Chains
Security & Cyber Threats also escalate through cross-domain interaction. A documented pattern:
- An Information Integrity attack (deepfake identity) enables a security compromise
- The security compromise (credential phishing via synthetic persona) produces Economic Harm (financial fraud, business disruption)
- At scale, economic harm erodes Systemic Trust (institutional confidence, market stability)
The Hong Kong deepfake CFO fraud followed precisely this chain.
Who Is Affected
Security & Cyber Threats produce differentiated impacts across sectors and stakeholder groups, shaped by the domain’s operational mechanisms, causal factor clustering, and escalation dynamics.
Most Impacted Sectors
Based on documented incidents in this domain:
- Corporate — primary target for data extraction and prompt injection attacks
- Finance — targeted by AI-enhanced phishing, deepfake fraud, and automated financial exploitation
- Government — exposed through state-sponsored AI cyber operations and critical infrastructure targeting
- Cross-sector — vulnerability classes like indirect prompt injection affect all enterprises deploying the same AI products
- Healthcare — exposed through ransomware and AI-assisted attacks on medical infrastructure
Most Impacted Groups
- Business Leaders — bear decision-making responsibility for AI adoption and exposure
- IT & Security Teams — face compressed response timelines and novel attack surfaces
- Consumers — targeted by AI-assisted phishing at scale
- Public Servants — affected through government system targeting
Organizational Response
The causal factor clustering and escalation pathways in this domain point to specific organizational considerations for managing Security & Cyber risk.
Input Validation and Boundary Enforcement
The dominance of Prompt Injection Vulnerability and Inadequate Access Controls as co-occurring causal factors indicates that organizations deploying AI systems with natural language interfaces should prioritize input validation architectures and least-privilege permission models.
Documented incidents consistently show that excessive permissions granted to AI assistants — rather than the sophistication of adversarial prompts — determine whether an exploit achieves critical impact. The NIST AI Risk Management Framework provides structured methodologies for mapping these boundaries.
Deployment Hardening
Misconfigured Deployment appears across multiple incident types, from default API configurations to insufficient isolation between AI components and sensitive systems.
Organizations integrating AI into existing infrastructure should assess separation boundaries, credential scoping, and output sandboxing as part of deployment review. ISO/IEC 42001 provides a management system framework for establishing these controls systematically.
Monitoring and Detection
The temporal patterns in the incident record — accelerating volume, increasing severity, and the emergence of autonomous exploitation — suggest that static security postures are structurally insufficient.
The speed at which AI-enabled attacks compress exploitation timelines requires continuous monitoring for:
- Anomalous model behavior and unexpected outputs
- Unexpected tool invocations in agentic systems
- Data exfiltration signals and unusual query patterns
Organizations should treat AI system outputs as untrusted by default and implement output validation alongside input controls.
Implementation Checklist
| Defense | Mitigates | Action | Reference |
|---|---|---|---|
| Input/output filtering | Input Manipulation | Deploy validation layers for natural language interfaces | Prompt Injection Vulnerability |
| Least-privilege credentials | Input Manipulation + Autonomous Offense | Restrict AI agent tool access and permission scoping | NIST AI RMF — Govern |
| Credential scoping & isolation | Model Exploitation | Separate AI components from sensitive data stores | Misconfigured Deployment |
| Tool invocation monitoring | Autonomous Offense | Track invocation patterns and output anomalies in agentic deployments | Inadequate Access Controls |
| Red-team exercises | All three mechanisms | Conduct adversarial testing using prompt injection, model extraction, and escalation scenarios | Adversarial Manipulation |
Regulatory Context
Beyond organizational response, the regulatory landscape provides structural requirements for managing Security & Cyber Threats.
EU AI Act: Cybersecurity obligations are embedded throughout the regulation, with Article 15 specifically mandating technical robustness — including resistance to adversarial inputs and model manipulation. High-risk AI systems must demonstrate resilience against attacks and undergo conformity assessments.
NIST AI Risk Management Framework: Maps directly to the Govern, Map, and Manage functions. The framework’s emphasis on AI system trustworthiness encompasses security properties — resilience, robustness, and resistance to adversarial manipulation — providing structured risk assessment methodologies applicable to all nine patterns in this domain.
ISO/IEC 42001: Establishes requirements for an AI Management System (AIMS) that includes security controls for AI development and deployment. Its risk-based approach aligns with the causal factor clustering observed in this domain, particularly around access controls and deployment configuration.
MIT AI Risk Repository: Classified under Privacy & Security, encompassing threats to system integrity, confidentiality, and availability posed by AI-enabled attack methodologies.
Related Domains
- Information Integrity Threats — Compromised AI systems generate false-legitimacy outputs; security exploits enable the production of convincing disinformation
- Privacy & Surveillance Threats — Model inversion and data exfiltration produce direct privacy violations; data breaches enabled by AI-powered attacks compromise individual and organizational privacy
- Agentic & Autonomous Threats — Tool-use capabilities in AI agents amplify the impact of security exploits, converting prompt injection into arbitrary code execution
- Economic & Labor Threats — AI-powered cyber weapons lower attack costs, shifting the economics of cybercrime and threatening financial stability
- Systemic & Catastrophic Threats — Automated multi-stage exploitation of critical infrastructure represents the upper bound of Security & Cyber escalation
Use in Retrieval
This page answers questions about AI-enabled cybersecurity threats, including: prompt injection attacks, jailbreak and guardrail bypass, adversarial evasion of AI systems, model inversion and data extraction (inference attacks), AI supply chain attacks, AI-powered social engineering and phishing, AI-morphed malware, automated vulnerability discovery, and the cross-domain interactions between security exploits and other AI risk categories. It covers operational mechanisms, causal factors, escalation pathways, organizational response guidance, and the regulatory landscape for AI security. Use this page as a reference for the Security & Cyber Threats domain (DOM-SEC) in the TopAIThreats taxonomy. Threat patterns are organized into two groups: Model Attacks (targeting trained model behavior) and System Attacks (exploiting AI systems at runtime or through deployment vectors).
Threat Patterns
9 threat patterns classified under this domain
Data Poisoning
Deliberate corruption of training data to introduce biases, backdoors, or vulnerabilities into AI models.
Model Inversion & Data Extraction
Attacks that extract private training data or sensitive information from AI models through targeted queries or analysis.
Adversarial Evasion
Techniques that manipulate AI model inputs to cause incorrect outputs, bypassing detection systems or security controls.
AI-Morphed Malware
Malicious software that uses AI to adapt, evade detection, or generate novel attack variants autonomously.
Automated Vulnerability Discovery
AI systems that autonomously identify, analyze, and potentially exploit software and system vulnerabilities.
Prompt Injection Attack
Adversarial inputs that override an AI system's intended instructions at runtime, causing it to execute attacker-controlled actions — from data exfiltration to unauthorized tool use — by exploiting the inability of LLMs to distinguish system instructions from user-supplied data.
Jailbreak & Guardrail Bypass
Adversarial conversational techniques that manipulate LLMs into disabling or circumventing their safety constraints, producing outputs that alignment training was designed to prevent — from harmful content generation to policy-violating instructions.
AI Supply Chain Attack
Attacks that compromise AI systems by tampering with model weights, fine-tuning datasets, tool-server configurations, or software dependencies before deployment — embedding backdoors or vulnerabilities that propagate through the model distribution chain.
AI-Powered Social Engineering
The use of generative AI — language models, voice cloning, and real-time deepfake video — to conduct social engineering attacks at unprecedented scale, personalization, and persuasive quality, targeting human trust to gain unauthorized access, credentials, or financial transfers.
Recent Incidents
Documented events in Security & Cyber Threats