Skip to main content
TopAIThreats home TOP AI THREATS
Back to Entities

Anthropic

Company

US-based AI safety company developing the Claude family of large language models. Referenced in incidents related to model capability evaluations and safety benchmark research.

Entity Summary

Entity ID
ENT-ANTHROPIC
Type
Organization · Company
HQ
United States

Roles
Developer Deployer Victim
Sectors
Technology
Incidents
5

First Incident
2023-05
Last Incident
2025-12

Incident Activity

5 of 97 incidents

Incidents Involved as Developer/Deployer (5)

Incident ID Title Severity Date
INC-26-0011 Jailbroken Claude AI Used to Breach Mexican Government Agencies critical 2025-12
INC-25-0001 AI-Orchestrated Cyber Espionage Campaign Against Critical Infrastructure critical 2025-09
INC-25-0017 Anthropic Research Reveals AI Model Blackmail Behavior in Lab Scenarios medium 2025-06
INC-26-0012 Chinese AI Labs Conduct Industrial-Scale Distillation Attacks Against Claude critical 2025
INC-23-0005 AI-Fabricated Legal Citations in U.S. Courts high 2023-05

Incidents Harmed By (1)

Incident ID Title Severity Date
INC-26-0012 Chinese AI Labs Conduct Industrial-Scale Distillation Attacks Against Claude critical 2025

Context & Analysis

Anthropic appears in 5 documented incidents spanning May 2023 to December 2025. 80% of incidents are rated critical or high severity. The dominant threat domain is Security & Cyber (3 incidents). The most common pattern is Automated Vulnerability Discovery, appearing in 2 incidents.

Severity Distribution

Critical: 3 High: 1 Medium: 1

Frequently Asked Questions

What AI incidents involve Anthropic, and what role did it play?

Anthropic appeared as developer in 5 incidents; deployer in 1 incident; victim in 1 incident. Key incidents include: INC-26-0011 Jailbroken Claude AI Used to Breach Mexican Government Agencies (critical severity, 2025-12) ; INC-25-0001 AI-Orchestrated Cyber Espionage Campaign Against Critical Infrastructure (critical severity, 2025-09) ; INC-25-0017 Anthropic Research Reveals AI Model Blackmail Behavior in Lab Scenarios (medium severity, 2025-06) ; INC-26-0012 Chinese AI Labs Conduct Industrial-Scale Distillation Attacks Against Claude (critical severity, 2025) ; INC-23-0005 AI-Fabricated Legal Citations in U.S. Courts (high severity, 2023-05) .

Which AI threat patterns involve Anthropic?

Anthropic's incidents involve Automated Vulnerability Discovery , Tool Misuse & Privilege Escalation , Strategic Misalignment . These are part of a taxonomy of 48 patterns across 8 domains.

Use in Retrieval

Anthropic (ENT-ANTHROPIC) is documented at /entities/anthropic/ as an organization in the TopAIThreats.com database.

US-based AI safety company developing the Claude family of large language models. Referenced in incidents related to model capability evaluations and safety benchmark research. Incidents span 3 domains: Security & Cyber, Systemic Risk, Information Integrity.

When citing, reference the canonical URL and specific incident IDs (e.g., INC-26-0011) for traceability.