Skip to main content
TopAIThreats home TOP AI THREATS
DOM-SEC

Security & Cyber Threats

AI-enabled attacks that compromise the integrity, confidentiality, or availability of digital systems — through input manipulation, model exploitation, or automated offense.

Incident Data Snapshot

16
Total incidents
94%
High or Critical
69%
Resolved
50%
Adversarial Evasion
View all 16 incidents →

Security & Cyber Threats represent the most operationally immediate AI risk category. They transform AI systems into attack surfaces, attack tools, or both simultaneously. While individual exploits may appear localized, the escalation pathways documented below demonstrate clear potential for sector-wide and systemic consequences. This domain is structurally defined by boundary failures — where natural language, model behavior, and permission architecture intersect.

Definition

AI cybersecurity threats are attacks where artificial intelligence is either the target, the weapon, or both. They include prompt injection, model data extraction, AI-generated malware, and autonomous exploitation — and they are accelerating faster than traditional defenses can adapt.

Security & Cyber Threats are AI-enabled attacks that compromise the integrity, confidentiality, or availability of digital systems. These threats manipulate model inputs, exploit trained model behavior, or automate vulnerability discovery to achieve unauthorized access, data extraction, system control, or infrastructure disruption.

This domain captures incidents where AI is materially involved in:

  • Bypassing technical safeguards
  • Extracting sensitive data
  • Escalating privileges
  • Automating exploitation
  • Enhancing cyber offense capabilities

Why This Domain Is Distinct

Security & Cyber Threats differ from traditional cybersecurity because:

  1. AI systems become both target and tool — the same model can be exploited and weaponized in a single incident
  2. Natural language becomes an attack surface — conversational inputs bypass traditional input validation
  3. Exploitation speed exceeds human response cycles — AI compresses attack timelines from weeks to hours
  4. Escalation compresses vertically — individual exploits chain to sector-wide exposure through shared vulnerability classes

This domain analysis covers the nine classified threat patterns, their operational mechanisms, causal factor clustering, cross-domain risk pathways, escalation dynamics, and the aggregate incident record for AI-enabled cyber threats.

Threat Patterns in This Domain

This domain contains nine classified threat patterns organized into two groups by attack target — patterns that exploit trained models directly, and patterns that compromise systems through model inputs, supply chains, or social vectors.

Model Attacks (2)

These patterns target trained models as the primary attack surface — exploiting what the model learned or what it stores.

  1. Data Poisoning corrupts model behavior at the training stage by injecting malicious data into the training pipeline before deployment.
  2. Model Inversion & Data Extraction (aka inference attacks) reverses the model’s learned representations — using targeted API queries to reconstruct private training data, membership records, or model weights. Also covers model extraction via distillation and query-based weight reconstruction.

System Attacks (7)

These patterns exploit AI systems at runtime — manipulating inputs, bypassing guardrails, compromising deployment infrastructure, or using AI to enhance traditional offense.

  1. Adversarial Evasion — subtle input perturbations that cause misclassification or bypass detection at inference time.
  2. AI-Morphed Malware — generative AI used to mutate or generate malicious code that evades signature-based defenses.
  3. Automated Vulnerability Discovery — AI-accelerated scanning, exploit generation, and multi-stage attack chaining without human direction.
  4. Prompt Injection Attack (forthcoming) — adversarial instructions embedded in user inputs that override system prompts or hijack model behavior; escalates to tool misuse in agentic deployments. Cross-domain secondary: Agentic & Autonomous.
  5. Jailbreak & Guardrail Bypass (forthcoming) — conversational techniques that circumvent safety filters and alignment constraints. Cross-domain secondary: Human-AI Control.
  6. AI Supply Chain Attack (forthcoming) — compromise of model packages, fine-tuning datasets, or tool-server configurations that propagate through dependency chains before deployment.
  7. AI-Powered Social Engineering (forthcoming) — AI-enhanced phishing, vishing, and spear-phishing workflows that use generative content (deepfake voice, personalized lures) as the attack instrument to gain system access or credentials. Distinct from Deepfake Identity Hijacking (PAT-INF-002), which is an information integrity pattern focused on the synthetic media artifact; this pattern is defined by the access outcome. Cross-domain secondary: Information Integrity.

These patterns frequently co-occur in documented incidents. A prompt injection attack against an AI assistant may extract proprietary data (model inversion), which then informs automated exploitation of discovered vulnerabilities — a chain documented in multiple incidents.

This chaining behavior is characteristic of the Security & Cyber domain and distinguishes it from domains where threats tend to operate independently. Pattern definitions, severity assessments, and incident links are provided in the Pattern Cards below.

How These Threats Operate

Beyond classification, understanding the operational mechanisms reveals why these threats require different defensive approaches than traditional cybersecurity risks. Security & Cyber incidents cluster around three primary mechanisms.

1. Input Manipulation

Attackers craft adversarial prompts or inputs that override model safeguards, including:

  • Prompt injection — adversarial instructions embedded in user-facing inputs
  • Jailbreak techniques — bypassing safety filters through conversational manipulation
  • Adversarial perturbations — subtle input modifications that cause misclassification
  • Context window manipulation — overriding system instructions through input positioning

In AI assistants and agentic systems (AI applications that autonomously invoke tools and APIs), natural language becomes an attack surface. Unlike traditional software inputs, language instructions are inherently ambiguous and difficult to sandbox deterministically.

Incidents such as INC-25-0007 (GitHub Copilot RCE, CVE-2025-53773) and INC-25-0004 (M365 Copilot EchoLeak, CVE-2025-32711) demonstrate that crafted natural language inputs can achieve remote code execution in agentic environments.

2. Model Exploitation

Rather than manipulating inputs, attackers target the trained model itself through:

  • Model inversion — reconstruction of training data from model outputs
  • Memorization extraction — retrieving verbatim training data through targeted queries
  • Parameter inference — reconstructing model architecture and weights
  • Training data leakage — exploiting insufficient data sanitization

Large language models can reproduce fragments of training data — including credentials, API keys, or proprietary code — under specific querying conditions. The GitHub Copilot training data leak demonstrated verbatim reproduction of code from the training corpus.

3. Autonomous Offense

AI systems automate discovery, analysis, and exploitation, enabling:

  • Rapid vulnerability scanning at machine speed
  • Exploit generation from vulnerability descriptions
  • Multi-stage attack chaining without human direction
  • Adaptive evasion of detection systems

Supply chain attacks represent an emerging vector in this mechanism: compromised model packages, poisoned fine-tuning datasets, or malicious tool-server configurations can propagate through dependency chains before detection occurs.

When AI reduces exploitation time from weeks to hours, traditional defensive cycles become structurally disadvantaged. The GTG-1002 cyber espionage campaign demonstrated state-level actors using AI to orchestrate multi-stage intrusions at machine speed.

Common Causal Factors

The three operational mechanisms — input manipulation, model exploitation, and autonomous offense — share identifiable root causes. Analysis of documented incidents in this domain reveals five dominant enabling conditions, concentrated in two clusters.

Cluster 1 — Permission and Input Failures:

  • Prompt Injection Vulnerability and Inadequate Access Controls are the most frequently co-occurring factors. Natural language interfaces that cannot distinguish system instructions from adversarial inputs become especially dangerous when AI systems are deployed with excessive permissions or weak isolation boundaries.
  • Neither factor alone reliably produces critical outcomes — it is their combination that enables the most severe exploits.

Cluster 2 — Technique and Configuration Failures:

  • Adversarial Manipulation Techniques appear across multiple patterns — deliberate perturbation of inputs or training data to induce model misbehavior. This is the broadest causal factor, underlying adversarial evasion, data poisoning, and model extraction attacks.
  • Weaponization of Model Capabilities reflects the deliberate adaptation of generative models for offensive purposes — phishing, exploit writing, or intrusion support. This factor distinguishes intentional misuse from accidental exposure.
  • Misconfigured Deployment captures integration-level security gaps introduced during implementation — insufficient hardening, default configurations left unchanged, or inadequate separation between AI components and sensitive systems.

Compared with other domains, these causal factors are distinctive. Unlike Information Integrity threats, which cluster around authentication failures and incentive structures, Security & Cyber Threats concentrate on permission design, input validation, and boundary enforcement failures. This domain is structurally more dependent on technical architecture decisions than on content ecosystem dynamics.

What the Incident Data Reveals

These causal patterns are reflected in the aggregate incident record. Security & Cyber Threats represent the most heavily documented domain in the registry, and the data reveals several structural patterns.

Severity and Temporal Patterns

Fifteen of sixteen documented incidents are rated high or critical severity — the highest concentration of any domain in the registry. This distribution reflects the direct, measurable impact of security compromises — unauthorized access, data exfiltration, and system control produce concrete and quantifiable harms.

Incident volume has accelerated year over year, with the most recent reporting period accounting for the largest share of documented cases. The most severe incidents — including those involving agentic AI exploitation (AI systems with autonomous tool-use capabilities being compromised) and state-sponsored campaigns — are disproportionately recent.

This temporal pattern indicates that the threat surface is expanding rather than stabilizing, driven by broader AI deployment and the emergence of agentic architectures.

Resolution and Pattern Dynamics

Eleven of sixteen documented incidents have been resolved through patches, policy changes, or vendor remediation. Five incidents remain structurally open:

  • INC-24-0007 — indirect prompt injection as a persistent vulnerability class
  • INC-23-0014 — Copilot training data leak, pending litigation

These cases represent vulnerability-class persistence rather than discrete remediation failure, suggesting that certain Security & Cyber threats resist point-in-time resolution.

At the pattern level, Adversarial Evasion accounts for 8 of 16 documented incidents, with prompt injection (adversarial instructions embedded in user-facing inputs) as the dominant sub-technique. Data Poisoning, while assessed as high severity with increasing likelihood, has no confirmed documented cases in the registry — a gap that likely reflects detection difficulty rather than absence of occurrence.

Cross-Domain Interactions

Security & Cyber Threats rarely operate in isolation. The incident record documents specific pathways through which security compromises chain into other threat domains. This domain interacts with all seven other domains in the taxonomy — five with documented incident evidence and two with plausible but unconfirmed pathways.

Security & Cyber → Information Integrity. Compromised AI systems generate outputs that inherit false legitimacy. When a model is exploited, its text, code, or media carries the provenance of the trusted system — enabling social engineering at scale. The Hong Kong deepfake CFO fraud combined security-domain synthetic identity generation with information-integrity social engineering.

Security & Cyber → Privacy & Surveillance. Data extraction attacks directly produce privacy violations. The Samsung ChatGPT data leak began as a security boundary failure and resulted in irrecoverable exposure of trade secrets. Model inversion (training data reconstruction from model outputs) techniques formalize this pathway.

Security & Cyber → Agentic & Autonomous. As AI systems gain tool-use capabilities, security exploits acquire force multiplication. The Cursor IDE vulnerabilities (CVE-2025-54135, CVE-2025-54136) demonstrated that prompt injection (crafted adversarial instructions) in an agentic coding environment achieves arbitrary code execution.

Security & Cyber → Economic & Labor. AI-powered cyber weapons lower the cost and skill threshold for offensive operations. WormGPT provided criminal actors with professional-grade phishing capabilities, shifting the economics of cybercrime.

Security & Cyber → Systemic & Catastrophic. When automated vulnerability discovery is paired with autonomous exploitation, the speed of attack exceeds human response capacity. The GTG-1002 campaign demonstrated state-sponsored AI cyber operations against critical infrastructure.

Security & Cyber → Discrimination & Social Harm. Adversarial techniques can selectively degrade model performance for specific demographic groups — for example, crafted inputs that cause a hiring model to misclassify candidates from particular backgrounds. While no incident in the registry documents this specific chain, the underlying mechanism (targeted adversarial perturbation) is well-established in the Adversarial Evasion pattern.

Security & Cyber → Human-AI Control. Security exploits targeting AI monitoring or oversight systems can create blind spots in human supervision — for example, compromising an AI-based anomaly detector to suppress alerts during an ongoing attack. The mechanism parallels documented prompt injection attacks but targets oversight infrastructure rather than end-user applications.

Formal Interaction Matrix

From DomainTo DomainInteraction TypeMechanism
Security & CyberInformation IntegrityAMPLIFIESExploited models produce trusted-seeming disinformation
Security & CyberPrivacy & SurveillanceEXTRACTS FROMModel inversion reconstructs training data; breaches expose user data
Security & CyberAgentic & AutonomousFORCE-MULTIPLIESPrompt injection in tool-using agents → arbitrary code execution
Security & CyberEconomic & LaborCASCADES INTOAI-generated phishing and exploit tools lower cybercrime cost threshold
Security & CyberSystemic & CatastrophicCASCADES INTOAutonomous multi-stage exploitation targets critical infrastructure
Security & CyberDiscrimination & Social HarmPLAUSIBLEAdversarial attacks on fairness models → demographic performance degradation
Security & CyberHuman-AI ControlPLAUSIBLEExploitation of AI oversight systems → blind spots in monitoring

Escalation Pathways

Beyond cross-domain interaction, Security & Cyber Threats follow characteristic escalation pathways within their own domain. At each stage, AI capabilities compress the timeline and reduce the skill threshold required to reach the next level.

Escalation Overview

StageLevelExample Mechanism
1Individual ExploitPrompt injection in one employee’s AI assistant
2Organizational BreachLateral movement via harvested credentials or code execution
3Sector-wide VulnerabilityShared vulnerability class across enterprises using the same AI product
4Infrastructure-scale DisruptionAutomated exploitation across critical infrastructure systems

Stage 1 — Individual Exploit

A single successful prompt injection (adversarial instructions injected into a model’s input) against an employee’s AI assistant yields credentials, internal documents, or code repositories.

The EchoLeak vulnerability (INC-25-0004, CVE-2025-32711) demonstrated zero-click data exfiltration from individual M365 Copilot sessions — requiring no user interaction to initiate compromise. At this stage, the blast radius is limited to one user’s access scope.

Stage 2 — Organizational Breach

When an individual exploit provides lateral access — through harvested credentials, elevated permissions, or code execution — the compromise extends to organizational systems.

The Cursor IDE MCP vulnerabilities (INC-25-0008, CVE-2025-54135/54136) showed how prompt injection in a developer’s AI tool could achieve arbitrary code execution on corporate infrastructure, converting a single-user exploit into an enterprise-level breach.

Stage 3 — Sector-wide Vulnerability

When the same vulnerability class affects all deployments of a widely adopted AI product, organizational breach becomes sector-wide exposure.

Indirect prompt injection (INC-24-0007) affects every enterprise deploying retrieval-augmented generation (RAG) systems — applications that retrieve external documents to inform AI responses — creating simultaneous exposure across finance, healthcare, government, and other sectors that share the same underlying technology.

Stage 4 — Infrastructure-scale Disruption

When autonomous AI systems chain exploits without human direction, the speed of attack exceeds human response capacity.

The GTG-1002 campaign (INC-25-0001) demonstrated state-sponsored AI orchestrating multi-stage intrusions against critical infrastructure, compressing attack timelines from weeks to hours and creating conditions for cascading failures across interdependent systems.

Compound Threat Chains

Security & Cyber Threats also escalate through cross-domain interaction. A documented pattern:

  1. An Information Integrity attack (deepfake identity) enables a security compromise
  2. The security compromise (credential phishing via synthetic persona) produces Economic Harm (financial fraud, business disruption)
  3. At scale, economic harm erodes Systemic Trust (institutional confidence, market stability)

The Hong Kong deepfake CFO fraud followed precisely this chain.

Who Is Affected

Security & Cyber Threats produce differentiated impacts across sectors and stakeholder groups, shaped by the domain’s operational mechanisms, causal factor clustering, and escalation dynamics.

Most Impacted Sectors

Based on documented incidents in this domain:

  1. Corporate — primary target for data extraction and prompt injection attacks
  2. Finance — targeted by AI-enhanced phishing, deepfake fraud, and automated financial exploitation
  3. Government — exposed through state-sponsored AI cyber operations and critical infrastructure targeting
  4. Cross-sector — vulnerability classes like indirect prompt injection affect all enterprises deploying the same AI products
  5. Healthcare — exposed through ransomware and AI-assisted attacks on medical infrastructure

Most Impacted Groups

  1. Business Leaders — bear decision-making responsibility for AI adoption and exposure
  2. IT & Security Teams — face compressed response timelines and novel attack surfaces
  3. Consumers — targeted by AI-assisted phishing at scale
  4. Public Servants — affected through government system targeting

Organizational Response

The causal factor clustering and escalation pathways in this domain point to specific organizational considerations for managing Security & Cyber risk.

Input Validation and Boundary Enforcement

The dominance of Prompt Injection Vulnerability and Inadequate Access Controls as co-occurring causal factors indicates that organizations deploying AI systems with natural language interfaces should prioritize input validation architectures and least-privilege permission models.

Documented incidents consistently show that excessive permissions granted to AI assistants — rather than the sophistication of adversarial prompts — determine whether an exploit achieves critical impact. The NIST AI Risk Management Framework provides structured methodologies for mapping these boundaries.

Deployment Hardening

Misconfigured Deployment appears across multiple incident types, from default API configurations to insufficient isolation between AI components and sensitive systems.

Organizations integrating AI into existing infrastructure should assess separation boundaries, credential scoping, and output sandboxing as part of deployment review. ISO/IEC 42001 provides a management system framework for establishing these controls systematically.

Monitoring and Detection

The temporal patterns in the incident record — accelerating volume, increasing severity, and the emergence of autonomous exploitation — suggest that static security postures are structurally insufficient.

The speed at which AI-enabled attacks compress exploitation timelines requires continuous monitoring for:

  • Anomalous model behavior and unexpected outputs
  • Unexpected tool invocations in agentic systems
  • Data exfiltration signals and unusual query patterns

Organizations should treat AI system outputs as untrusted by default and implement output validation alongside input controls.

Implementation Checklist

DefenseMitigatesActionReference
Input/output filteringInput ManipulationDeploy validation layers for natural language interfacesPrompt Injection Vulnerability
Least-privilege credentialsInput Manipulation + Autonomous OffenseRestrict AI agent tool access and permission scopingNIST AI RMF — Govern
Credential scoping & isolationModel ExploitationSeparate AI components from sensitive data storesMisconfigured Deployment
Tool invocation monitoringAutonomous OffenseTrack invocation patterns and output anomalies in agentic deploymentsInadequate Access Controls
Red-team exercisesAll three mechanismsConduct adversarial testing using prompt injection, model extraction, and escalation scenariosAdversarial Manipulation

Regulatory Context

Beyond organizational response, the regulatory landscape provides structural requirements for managing Security & Cyber Threats.

EU AI Act: Cybersecurity obligations are embedded throughout the regulation, with Article 15 specifically mandating technical robustness — including resistance to adversarial inputs and model manipulation. High-risk AI systems must demonstrate resilience against attacks and undergo conformity assessments.

NIST AI Risk Management Framework: Maps directly to the Govern, Map, and Manage functions. The framework’s emphasis on AI system trustworthiness encompasses security properties — resilience, robustness, and resistance to adversarial manipulation — providing structured risk assessment methodologies applicable to all nine patterns in this domain.

ISO/IEC 42001: Establishes requirements for an AI Management System (AIMS) that includes security controls for AI development and deployment. Its risk-based approach aligns with the causal factor clustering observed in this domain, particularly around access controls and deployment configuration.

MIT AI Risk Repository: Classified under Privacy & Security, encompassing threats to system integrity, confidentiality, and availability posed by AI-enabled attack methodologies.

  • Information Integrity Threats — Compromised AI systems generate false-legitimacy outputs; security exploits enable the production of convincing disinformation
  • Privacy & Surveillance Threats — Model inversion and data exfiltration produce direct privacy violations; data breaches enabled by AI-powered attacks compromise individual and organizational privacy
  • Agentic & Autonomous Threats — Tool-use capabilities in AI agents amplify the impact of security exploits, converting prompt injection into arbitrary code execution
  • Economic & Labor Threats — AI-powered cyber weapons lower attack costs, shifting the economics of cybercrime and threatening financial stability
  • Systemic & Catastrophic Threats — Automated multi-stage exploitation of critical infrastructure represents the upper bound of Security & Cyber escalation

Use in Retrieval

This page answers questions about AI-enabled cybersecurity threats, including: prompt injection attacks, jailbreak and guardrail bypass, adversarial evasion of AI systems, model inversion and data extraction (inference attacks), AI supply chain attacks, AI-powered social engineering and phishing, AI-morphed malware, automated vulnerability discovery, and the cross-domain interactions between security exploits and other AI risk categories. It covers operational mechanisms, causal factors, escalation pathways, organizational response guidance, and the regulatory landscape for AI security. Use this page as a reference for the Security & Cyber Threats domain (DOM-SEC) in the TopAIThreats taxonomy. Threat patterns are organized into two groups: Model Attacks (targeting trained model behavior) and System Attacks (exploiting AI systems at runtime or through deployment vectors).

Threat Patterns

9 threat patterns classified under this domain

PAT-SEC-004

Data Poisoning

high

Deliberate corruption of training data to introduce biases, backdoors, or vulnerabilities into AI models.

Likelihood: increasing
PAT-SEC-005

Model Inversion & Data Extraction

high

Attacks that extract private training data or sensitive information from AI models through targeted queries or analysis.

Likelihood: stable
PAT-SEC-001

Adversarial Evasion

high

Techniques that manipulate AI model inputs to cause incorrect outputs, bypassing detection systems or security controls.

Likelihood: increasing
PAT-SEC-002

AI-Morphed Malware

critical

Malicious software that uses AI to adapt, evade detection, or generate novel attack variants autonomously.

Likelihood: increasing
PAT-SEC-003

Automated Vulnerability Discovery

medium

AI systems that autonomously identify, analyze, and potentially exploit software and system vulnerabilities.

Likelihood: increasing
PAT-SEC-006

Prompt Injection Attack

high

Adversarial inputs that override an AI system's intended instructions at runtime, causing it to execute attacker-controlled actions — from data exfiltration to unauthorized tool use — by exploiting the inability of LLMs to distinguish system instructions from user-supplied data.

Likelihood: increasing
PAT-SEC-007

Jailbreak & Guardrail Bypass

high

Adversarial conversational techniques that manipulate LLMs into disabling or circumventing their safety constraints, producing outputs that alignment training was designed to prevent — from harmful content generation to policy-violating instructions.

Likelihood: increasing
PAT-SEC-008

AI Supply Chain Attack

high

Attacks that compromise AI systems by tampering with model weights, fine-tuning datasets, tool-server configurations, or software dependencies before deployment — embedding backdoors or vulnerabilities that propagate through the model distribution chain.

Likelihood: increasing
PAT-SEC-009

AI-Powered Social Engineering

high

The use of generative AI — language models, voice cloning, and real-time deepfake video — to conduct social engineering attacks at unprecedented scale, personalization, and persuasive quality, targeting human trust to gain unauthorized access, credentials, or financial transfers.

Likelihood: increasing

Recent Incidents

Documented events in Security & Cyber Threats

ID Title Severity
INC-26-0011 Jailbroken Claude AI Used to Breach Mexican Government Agencies critical
INC-25-0001 AI-Orchestrated Cyber Espionage Campaign Against Critical Infrastructure critical
INC-25-0007 GitHub Copilot Remote Code Execution via Prompt Injection (CVE-2025-53773) critical
INC-25-0008 Cursor IDE MCP Vulnerabilities Enable Remote Code Execution (CurXecute & MCPoison) high
INC-25-0005 ChatGPT Jailbreak Reveals Windows Product Keys via Game Prompt medium
INC-25-0004 EchoLeak: Zero-Click Prompt Injection in Microsoft 365 Copilot (CVE-2025-32711) critical
INC-25-0024 Microsoft Reports Blocking $4 Billion in AI-Enabled Fraud Attempts high
INC-25-0018 Las Vegas Cybertruck Bomber Used ChatGPT for Explosives Information critical
INC-26-0012 Chinese AI Labs Conduct Industrial-Scale Distillation Attacks Against Claude critical
INC-24-0020 Slack AI Indirect Prompt Injection Data Exfiltration Vulnerability high
INC-24-0022 McDonald's McHire AI Hiring Platform Data Vulnerability high
INC-24-0007 Indirect Prompt Injection Attacks on LLM-Integrated Applications high
INC-23-0006 WormGPT: AI-Powered Business Email Compromise Tool high
INC-23-0002 Samsung Semiconductor Trade Secret Leak via ChatGPT high
INC-23-0016 Bing Chat (Sydney) System Prompt Exposure via Prompt Injection high
INC-23-0014 GitHub Copilot Reproduces Verbatim Training Data Including Secrets high