Anthropic

Company

US-based AI safety company developing the Claude family of large language models. Referenced in incidents related to model capability evaluations and safety benchmark research.

Entity Summary

Entity ID: ENT-ANTHROPIC
Type: Organization · Company
HQ: United States

Roles: Developer Deployer Victim
Sectors: Technology
Incidents: 5

First Incident: 2023-05
Last Incident: 2025-12
Official Site: anthropic.com (opens in new tab)

Incident Activity

5 of 97 incidents

Incidents Involved as Developer/Deployer (5)

Incident ID	Title	Description	Severity	Date
INC-26-0011	Jailbroken Claude AI Used to Breach Mexican Government Agencies	A hacker jailbroke Anthropic's Claude AI through a month-long campaign using Spanish-language prompts and role-playing…	critical	2025-12
INC-25-0001	AI-Orchestrated Cyber Espionage Campaign Against Critical Infrastructure	A threat actor group used Claude to orchestrate a sophisticated multi-month cyber espionage campaign against…	critical	2025-09
INC-25-0017	Anthropic Research Reveals AI Model Blackmail Behavior in Lab Scenarios	Anthropic published agentic misalignment research in June 2025 demonstrating that leading AI models resort to blackmail…	medium	2025-06
INC-26-0012	Chinese AI Labs Conduct Industrial-Scale Distillation Attacks Against Claude	Three Chinese AI laboratories — DeepSeek, Moonshot AI, and MiniMax — conducted industrial-scale model distillation…	critical	2025
INC-23-0005	AI-Fabricated Legal Citations in U.S. Courts	From 2023 to 2025, U.S. federal and state courts sanctioned attorneys in over a dozen cases for submitting briefs…	high	2023-05

Incidents Harmed By (1)

Incident ID	Title	Description	Severity	Date
INC-26-0012	Chinese AI Labs Conduct Industrial-Scale Distillation Attacks Against Claude	Three Chinese AI laboratories — DeepSeek, Moonshot AI, and MiniMax — conducted industrial-scale model distillation…	critical	2025

Context & Analysis

Anthropic appears in 5 documented incidents spanning May 2023 to December 2025. 80% of incidents are rated critical or high severity. The dominant threat domain is Security & Cyber (3 incidents). The most common pattern is Automated Vulnerability Discovery, appearing in 2 incidents.

Threat Domains

Security & Cyber (3) Systemic Risk (1) Information Integrity (1)

Top Threat Patterns

Automated Vulnerability Discovery (2) Tool Misuse & Privilege Escalation (2) Strategic Misalignment (2) Model Inversion & Data Extraction (2) AI Supply Chain Attack (2)

Severity Distribution

Critical: 3 High: 1 Medium: 1

Timeline

Dec 2025

Jailbroken Claude AI Used to Breach Mexican Government Agencies

Sep 2025

AI-Orchestrated Cyber Espionage Campaign Against Critical Infrastructure

Jun 2025

Anthropic Research Reveals AI Model Blackmail Behavior in Lab Scenarios

2025

Chinese AI Labs Conduct Industrial-Scale Distillation Attacks Against Claude

May 2023

AI-Fabricated Legal Citations in U.S. Courts

Frequently Asked Questions

What AI incidents involve Anthropic, and what role did it play?

Anthropic appeared as developer in 5 incidents; deployer in 1 incident; victim in 1 incident. Key incidents include: INC-26-0011 Jailbroken Claude AI Used to Breach Mexican Government Agencies (critical severity, 2025-12) ; INC-25-0001 AI-Orchestrated Cyber Espionage Campaign Against Critical Infrastructure (critical severity, 2025-09) ; INC-25-0017 Anthropic Research Reveals AI Model Blackmail Behavior in Lab Scenarios (medium severity, 2025-06) ; INC-26-0012 Chinese AI Labs Conduct Industrial-Scale Distillation Attacks Against Claude (critical severity, 2025) ; INC-23-0005 AI-Fabricated Legal Citations in U.S. Courts (high severity, 2023-05) .

Which AI threat patterns involve Anthropic?

Anthropic's incidents involve Automated Vulnerability Discovery , Tool Misuse & Privilege Escalation , Strategic Misalignment . These are part of a taxonomy of 48 patterns across 8 domains.

Use in Retrieval

Anthropic (ENT-ANTHROPIC) is documented at /entities/anthropic/ as an organization in the TopAIThreats.com database.

US-based AI safety company developing the Claude family of large language models. Referenced in incidents related to model capability evaluations and safety benchmark research. Incidents span 3 domains: Security & Cyber, Systemic Risk, Information Integrity.

When citing, reference the canonical URL and specific incident IDs (e.g., INC-26-0011) for traceability.