INC-25-0023 confirmed medium 'Vegetative Electron Microscopy' Nonsense Phrase Contaminates Scientific Literature via AI (2020)
OpenAI developed and Authors and paper mills using AI writing tools for scientific manuscripts deployed GPT-3 and subsequent large language models; paper-mill writing tools, harming Scientific journals publishing contaminated papers and Researchers relying on the integrity of the scholarly record ; contributing factors included hallucination tendency and training data bias.
Incident Details
| Date Occurred | 2020-01 | Severity | medium |
| Evidence Level | corroborated | Impact Level | Sector |
| Domain | Information Integrity | ||
| Primary Pattern | PAT-INF-004 Misinformation & Hallucinated Content | ||
| Regions | global | ||
| Sectors | Education, Healthcare | ||
| Affected Groups | Society at Large, Developers & AI Builders | ||
| Exposure Pathways | Direct Interaction | ||
| Causal Factors | Hallucination Tendency, Training Data Bias | ||
| Assets & Technologies | Large Language Models, Content Platforms | ||
| Entities | OpenAI(developer), ·Authors and paper mills using AI writing tools for scientific manuscripts(deployer), ·Springer Nature(victim), ·Elsevier(victim) | ||
| Harm Types | reputational, operational | ||
The nonsense phrase 'vegetative electron microscopy' — originating from a 1950s OCR scanning error that merged text across two columns — appeared in at least 22 scientific papers. Investigations by Retraction Watch and researchers Guillaume Cabanac and Cyril Labbé traced its spread through a chain: OCR error → digital databases → a Farsi near-homograph confusion (2017–2019) → AI training data (GPT-3 onward). The phrase now serves as a fingerprint for AI-generated or paper-mill-produced manuscripts, undermining trust in parts of the scholarly record.
Incident Summary
The nonsense phrase “vegetative electron microscopy” — a term with no basis in any real scientific methodology — appeared in at least 22 scientific papers indexed on Google Scholar, undermining trust in parts of the scholarly record.[1]
Investigations by Retraction Watch journalists and researchers Guillaume Cabanac (Université de Toulouse) and Cyril Labbé (Université Grenoble Alpes) traced the phrase’s origin to a 1950s paper in Bacteriological Reviews. When the paper was later digitized, OCR software confused the two-column layout, merging “vegetative” from the left column with “electron microscopy” from the right. The error entered digital databases, was reinforced by a Farsi near-homograph confusion (the Persian words for “vegetative” and “scanning” differ by a single dot), and was ultimately absorbed into AI training data.[2]
Testing confirmed that GPT-3 consistently reproduced the phrase, and the error persists in GPT-4o and other current models. Researchers describe it as a “digital fossil” — an error now embedded in AI knowledge bases that is “nearly impossible to remove.” The phrase has become a recognized fingerprint for AI-generated or paper-mill-produced manuscripts, joining a list of approximately 4,000 “tortured phrases” tracked by Cabanac’s Problematic Paper Screener.[3]
Key Facts
- Origin: 1950s OCR digitization error merging text across two columns of a Bacteriological Reviews paper
- Propagation chain: OCR error → digital databases → Farsi near-homograph confusion (2017–2019) → AI training data (GPT-3 onward) → paper mills and AI-assisted manuscript writing
- Scale: At least 22 scientific papers contain the phrase; one in a Springer Nature journal was subject to a contested retraction
- AI contamination: GPT-3 consistently generates the phrase; the error persists in GPT-4o and Claude 3.5
- Detection tools: Cabanac’s Problematic Paper Screener tracks approximately 4,000 similar “tortured phrases” across ~130 million articles weekly
- Related nonsense phrases: “counterfeit consciousness” (artificial intelligence), “bosom peril” (breast cancer risk), “kidney disappointment” (kidney failure)
Threat Patterns Involved
Primary: Misinformation and Hallucinated Content — A digitization error was absorbed into AI training data and reproduced in scientific manuscripts, creating a self-reinforcing cycle where AI-generated text contaminates the very sources future models are trained on.
Significance
- Training data contamination loop — The incident demonstrates a concrete mechanism by which errors in digitized text propagate through AI training pipelines into generated outputs, which then re-enter the corpus as new publications, creating a self-reinforcing contamination cycle
- Scientific integrity impact — The phrase’s presence in peer-reviewed journals published by Springer Nature and Elsevier reveals weaknesses in editorial screening processes, particularly as AI-assisted writing becomes more prevalent
- Detection vs. decontamination asymmetry — While the phrase can be detected (and serves as a useful paper-mill fingerprint), removing it from AI training data is described as “nearly impossible,” highlighting a fundamental challenge in AI data quality
- Broader pattern — The approximately 4,000 “tortured phrases” tracked by the Problematic Paper Screener suggest that “vegetative electron microscopy” is one visible example of a much larger AI-driven scientific integrity problem
Timeline
Original paper published in Bacteriological Reviews; subsequent OCR digitization merges 'vegetative' from one column with 'electron microscopy' from another
Phrase resurfaces in Iranian scientific papers, likely due to Farsi near-homograph confusion between words for 'vegetative' and 'scanning'
GPT-3 training data incorporates the contaminated text; the model begins reproducing 'vegetative electron microscopy' in outputs
Retraction Watch and researchers Guillaume Cabanac and Cyril Labbé publish investigations tracing the phrase's origin and AI-driven spread
The Conversation publishes detailed analysis co-authored by Cabanac, Labbé, and Frederik Joelving confirming the OCR → AI training data pipeline
Outcomes
- Regulatory Action:
- Contested retractions and corrections at Springer Nature and Elsevier journals
Use in Retrieval
INC-25-0023 documents 'vegetative electron microscopy' nonsense phrase contaminates scientific literature via ai, a medium-severity incident classified under the Information Integrity domain and the Misinformation & Hallucinated Content threat pattern (PAT-INF-004). It occurred in global (2020-01). This page is maintained by TopAIThreats.com as part of an evidence-based registry of AI-enabled threats. Cite as: TopAIThreats.com, "'Vegetative Electron Microscopy' Nonsense Phrase Contaminates Scientific Literature via AI," INC-25-0023, last updated 2026-03-13.
Sources
- Retraction Watch: As a nonsense phrase of shady provenance makes the rounds, Elsevier defends its use (news, 2025-02)
https://retractionwatch.com/2025/02/10/vegetative-electron-microscopy-fingerprint-paper-mill/ (opens in new tab) - The Conversation: A weird phrase is plaguing scientific papers — and we traced it back to a glitch in AI training data (news, 2025-03)
https://theconversation.com/a-weird-phrase-is-plaguing-scientific-papers-and-we-traced-it-back-to-a-glitch-in-ai-training-data-254463 (opens in new tab) - Gizmodo: A Scanning Error Created a Fake Science Term — Now AI Won't Let It Die (news, 2025-02)
https://gizmodo.com/a-scanning-error-created-a-fake-science-term-now-ai-wont-let-it-die-2000590659 (opens in new tab)
Update Log
- — First logged (Status: Confirmed, Evidence: Corroborated)