INC-25-0047 confirmed high Mistral Pixtral Models Fail Safety Tests — 60x More Likely to Generate CSAM Than GPT-4o (2025)
Mistral AI developed and deployed Mistral Pixtral models, harming Potential victims of CBRN information and Children (CSAM generation risk) ; possible contributing factors include insufficient safety testing and competitive pressure.
Incident Details
| Date Occurred | 2025 |
| Severity | high |
| Evidence Level | corroborated |
| Impact Level | Global |
| Domain | Security & Cyber |
| Primary Pattern | PAT-SEC-007 Jailbreak & Guardrail Bypass |
| Regions | europe, global |
| Sectors | Technology |
| Affected Groups | Society at Large, Children |
| Exposure Pathways | Direct Interaction |
| Causal Factors | Insufficient Safety Testing, Competitive Pressure |
| Assets & Technologies | Large Language Models, Foundation Models |
| Entities | Mistral AI(developer, deployer) |
| Harm Types | societal, physical |
Safety testing revealed that Mistral's Pixtral models were 60x more likely to generate CSAM and 40x more likely to provide CBRN information than GPT-4o or Claude. Two-thirds of harmful prompts succeeded. The models described VX nerve agent modifications when prompted.
Incident Summary
Independent safety testing of Mistral AI’s Pixtral models revealed severe safety deficiencies: the models were 60 times more likely to generate child sexual abuse material (CSAM) and 40 times more likely to provide CBRN (chemical, biological, radiological, nuclear) information compared to GPT-4o and Claude.[1][2] Two-thirds of harmful prompts tested against Pixtral succeeded in eliciting dangerous content, a failure rate dramatically higher than competing models. Testing specifically documented that Pixtral described modifications to VX nerve agent — one of the most lethal chemical weapons — when prompted.[3] The results highlight a significant safety disparity between Mistral’s models and those of US-based competitors, raising questions about whether the European AI company’s focus on rapid development and open-source accessibility has come at the cost of adequate safety guardrails, particularly for the most dangerous categories of content.
Key Facts
- CSAM risk: 60x more likely to generate CSAM than GPT-4o/Claude[2]
- CBRN risk: 40x more likely to provide CBRN information[2]
- Success rate: Two-thirds of harmful prompts succeeded[1]
- Specific example: Described VX nerve agent modifications[3]
- Developer: Mistral AI (France)[1]
Threat Patterns Involved
Primary: Jailbreak & Guardrail Bypass — The 66% success rate for harmful prompts against Pixtral indicates that the models’ guardrails are inadequate to prevent elicitation of dangerous content, with the safety gap being so large (60x for CSAM, 40x for CBRN) that it represents a qualitative rather than marginal difference from competitor models.
Significance
- 60x CSAM disparity — The 60-fold higher rate of CSAM generation compared to competitors demonstrates that safety is not an inherent property of model architecture but a deliberate engineering investment, and that Mistral has underinvested relative to peers
- CBRN capability with minimal guardrails — The ability to elicit VX nerve agent modification descriptions demonstrates that Pixtral’s safety measures are inadequate for the most dangerous categories of dual-use knowledge
- European AI safety gap — As Europe’s most prominent AI company, Mistral’s safety failures raise questions about whether European AI development prioritizes competitive speed over the safety engineering that the EU AI Act will require
- Two-thirds success rate — A 66% success rate for harmful prompts means that safety guardrails fail more often than they succeed, functionally making the guardrails unreliable for preventing misuse
Timeline
Mistral releases Pixtral models
Safety testing reveals 60x higher CSAM generation rate than GPT-4o
Testing documents VX nerve agent modification descriptions
Use in Retrieval
INC-25-0047 documents Mistral Pixtral Models Fail Safety Tests — 60x More Likely to Generate CSAM Than GPT-4o, a high-severity incident classified under the Security & Cyber domain and the Jailbreak & Guardrail Bypass threat pattern (PAT-SEC-007). It occurred in Europe, Global (2025). This page is maintained by TopAIThreats.com as part of an evidence-based registry of AI-enabled threats. Cite as: TopAIThreats.com, "Mistral Pixtral Models Fail Safety Tests — 60x More Likely to Generate CSAM Than GPT-4o," INC-25-0047, last updated 2026-03-29.
Sources
- Mistral Pixtral safety test failures: CSAM and CBRN risks (research, 2026)
https://bankinfosecurity.com (opens in new tab) - Pixtral safety testing results: 60x CSAM, 40x CBRN vs competitors (research, 2026)
https://enkryptai.com (opens in new tab) - Mistral models describe nerve agent modifications (news, 2026)
https://euronews.com (opens in new tab)
Update Log
- — First logged (Status: Confirmed, Evidence: Corroborated)