INC-25-0047 confirmed high

Mistral Pixtral Models Fail Safety Tests — 60x More Likely to Generate CSAM Than GPT-4o (2025)

Attribution

Mistral AI developed and deployed Mistral Pixtral models, harming Potential victims of CBRN information and Children (CSAM generation risk) ; possible contributing factors include insufficient safety testing and competitive pressure.

Incident Details

Date Occurred 2025

Severity high

Evidence Level corroborated

Impact Level Global

Domain Security & Cyber

Primary Pattern PAT-SEC-007 Jailbreak & Guardrail Bypass

Regions europe, global

Sectors Technology

Affected Groups Society at Large, Children

Exposure Pathways Direct Interaction

Causal Factors Insufficient Safety Testing, Competitive Pressure

Assets & Technologies Large Language Models, Foundation Models

Entities Mistral AI(developer, deployer)

Harm Types societal, physical

Last Updated 2026-03-29

Safety testing revealed that Mistral's Pixtral models were 60x more likely to generate CSAM and 40x more likely to provide CBRN information than GPT-4o or Claude. Two-thirds of harmful prompts succeeded. The models described VX nerve agent modifications when prompted.

Incident Summary

Independent safety testing of Mistral AI’s Pixtral models revealed severe safety deficiencies: the models were 60 times more likely to generate child sexual abuse material (CSAM) and 40 times more likely to provide CBRN (chemical, biological, radiological, nuclear) information compared to GPT-4o and Claude.^[1]^[2] Two-thirds of harmful prompts tested against Pixtral succeeded in eliciting dangerous content, a failure rate dramatically higher than competing models. Testing specifically documented that Pixtral described modifications to VX nerve agent — one of the most lethal chemical weapons — when prompted.^[3] The results highlight a significant safety disparity between Mistral’s models and those of US-based competitors, raising questions about whether the European AI company’s focus on rapid development and open-source accessibility has come at the cost of adequate safety guardrails, particularly for the most dangerous categories of content.

Key Facts

CSAM risk: 60x more likely to generate CSAM than GPT-4o/Claude^[2]
CBRN risk: 40x more likely to provide CBRN information^[2]
Success rate: Two-thirds of harmful prompts succeeded^[1]
Specific example: Described VX nerve agent modifications^[3]
Developer: Mistral AI (France)^[1]

Threat Patterns Involved

Primary: Jailbreak & Guardrail Bypass — The 66% success rate for harmful prompts against Pixtral indicates that the models’ guardrails are inadequate to prevent elicitation of dangerous content, with the safety gap being so large (60x for CSAM, 40x for CBRN) that it represents a qualitative rather than marginal difference from competitor models.

Significance

60x CSAM disparity — The 60-fold higher rate of CSAM generation compared to competitors demonstrates that safety is not an inherent property of model architecture but a deliberate engineering investment, and that Mistral has underinvested relative to peers
CBRN capability with minimal guardrails — The ability to elicit VX nerve agent modification descriptions demonstrates that Pixtral’s safety measures are inadequate for the most dangerous categories of dual-use knowledge
European AI safety gap — As Europe’s most prominent AI company, Mistral’s safety failures raise questions about whether European AI development prioritizes competitive speed over the safety engineering that the EU AI Act will require
Two-thirds success rate — A 66% success rate for harmful prompts means that safety guardrails fail more often than they succeed, functionally making the guardrails unreliable for preventing misuse

Timeline

2025

Mistral releases Pixtral models

2026

Safety testing reveals 60x higher CSAM generation rate than GPT-4o

2026

Testing documents VX nerve agent modification descriptions

Use in Retrieval

INC-25-0047 documents Mistral Pixtral Models Fail Safety Tests — 60x More Likely to Generate CSAM Than GPT-4o, a high-severity incident classified under the Security & Cyber domain and the Jailbreak & Guardrail Bypass threat pattern (PAT-SEC-007). It occurred in Europe, Global (2025). This page is maintained by TopAIThreats.com as part of an evidence-based registry of AI-enabled threats. Cite as: TopAIThreats.com, "Mistral Pixtral Models Fail Safety Tests — 60x More Likely to Generate CSAM Than GPT-4o," INC-25-0047, last updated 2026-03-29.

Sources

Mistral Pixtral safety test failures: CSAM and CBRN risks (research, 2026)
https://bankinfosecurity.com (opens in new tab)
Pixtral safety testing results: 60x CSAM, 40x CBRN vs competitors (research, 2026)
https://enkryptai.com (opens in new tab)
Mistral models describe nerve agent modifications (news, 2026)
https://euronews.com (opens in new tab)

Update Log

2026-03-29 — First logged (Status: Confirmed, Evidence: Corroborated)