Skip to main content
TopAIThreats home TOP AI THREATS
INC-25-0023 confirmed medium

'Vegetative Electron Microscopy' Nonsense Phrase Contaminates Scientific Literature via AI (2020)

Alleged

OpenAI developed and Authors and paper mills using AI writing tools for scientific manuscripts deployed GPT-3 and subsequent large language models; paper-mill writing tools, harming Scientific journals publishing contaminated papers and Researchers relying on the integrity of the scholarly record ; contributing factors included hallucination tendency and training data bias.

Incident Details

Last Updated 2026-03-13

The nonsense phrase 'vegetative electron microscopy' — originating from a 1950s OCR scanning error that merged text across two columns — appeared in at least 22 scientific papers. Investigations by Retraction Watch and researchers Guillaume Cabanac and Cyril Labbé traced its spread through a chain: OCR error → digital databases → a Farsi near-homograph confusion (2017–2019) → AI training data (GPT-3 onward). The phrase now serves as a fingerprint for AI-generated or paper-mill-produced manuscripts, undermining trust in parts of the scholarly record.

Incident Summary

The nonsense phrase “vegetative electron microscopy” — a term with no basis in any real scientific methodology — appeared in at least 22 scientific papers indexed on Google Scholar, undermining trust in parts of the scholarly record.[1]

Investigations by Retraction Watch journalists and researchers Guillaume Cabanac (Université de Toulouse) and Cyril Labbé (Université Grenoble Alpes) traced the phrase’s origin to a 1950s paper in Bacteriological Reviews. When the paper was later digitized, OCR software confused the two-column layout, merging “vegetative” from the left column with “electron microscopy” from the right. The error entered digital databases, was reinforced by a Farsi near-homograph confusion (the Persian words for “vegetative” and “scanning” differ by a single dot), and was ultimately absorbed into AI training data.[2]

Testing confirmed that GPT-3 consistently reproduced the phrase, and the error persists in GPT-4o and other current models. Researchers describe it as a “digital fossil” — an error now embedded in AI knowledge bases that is “nearly impossible to remove.” The phrase has become a recognized fingerprint for AI-generated or paper-mill-produced manuscripts, joining a list of approximately 4,000 “tortured phrases” tracked by Cabanac’s Problematic Paper Screener.[3]

Key Facts

  • Origin: 1950s OCR digitization error merging text across two columns of a Bacteriological Reviews paper
  • Propagation chain: OCR error → digital databases → Farsi near-homograph confusion (2017–2019) → AI training data (GPT-3 onward) → paper mills and AI-assisted manuscript writing
  • Scale: At least 22 scientific papers contain the phrase; one in a Springer Nature journal was subject to a contested retraction
  • AI contamination: GPT-3 consistently generates the phrase; the error persists in GPT-4o and Claude 3.5
  • Detection tools: Cabanac’s Problematic Paper Screener tracks approximately 4,000 similar “tortured phrases” across ~130 million articles weekly
  • Related nonsense phrases: “counterfeit consciousness” (artificial intelligence), “bosom peril” (breast cancer risk), “kidney disappointment” (kidney failure)

Threat Patterns Involved

Primary: Misinformation and Hallucinated Content — A digitization error was absorbed into AI training data and reproduced in scientific manuscripts, creating a self-reinforcing cycle where AI-generated text contaminates the very sources future models are trained on.

Significance

  1. Training data contamination loop — The incident demonstrates a concrete mechanism by which errors in digitized text propagate through AI training pipelines into generated outputs, which then re-enter the corpus as new publications, creating a self-reinforcing contamination cycle
  2. Scientific integrity impact — The phrase’s presence in peer-reviewed journals published by Springer Nature and Elsevier reveals weaknesses in editorial screening processes, particularly as AI-assisted writing becomes more prevalent
  3. Detection vs. decontamination asymmetry — While the phrase can be detected (and serves as a useful paper-mill fingerprint), removing it from AI training data is described as “nearly impossible,” highlighting a fundamental challenge in AI data quality
  4. Broader pattern — The approximately 4,000 “tortured phrases” tracked by the Problematic Paper Screener suggest that “vegetative electron microscopy” is one visible example of a much larger AI-driven scientific integrity problem

Timeline

Original paper published in Bacteriological Reviews; subsequent OCR digitization merges 'vegetative' from one column with 'electron microscopy' from another

Phrase resurfaces in Iranian scientific papers, likely due to Farsi near-homograph confusion between words for 'vegetative' and 'scanning'

GPT-3 training data incorporates the contaminated text; the model begins reproducing 'vegetative electron microscopy' in outputs

Retraction Watch and researchers Guillaume Cabanac and Cyril Labbé publish investigations tracing the phrase's origin and AI-driven spread

The Conversation publishes detailed analysis co-authored by Cabanac, Labbé, and Frederik Joelving confirming the OCR → AI training data pipeline

Outcomes

Regulatory Action:
Contested retractions and corrections at Springer Nature and Elsevier journals

Use in Retrieval

INC-25-0023 documents 'vegetative electron microscopy' nonsense phrase contaminates scientific literature via ai, a medium-severity incident classified under the Information Integrity domain and the Misinformation & Hallucinated Content threat pattern (PAT-INF-004). It occurred in global (2020-01). This page is maintained by TopAIThreats.com as part of an evidence-based registry of AI-enabled threats. Cite as: TopAIThreats.com, "'Vegetative Electron Microscopy' Nonsense Phrase Contaminates Scientific Literature via AI," INC-25-0023, last updated 2026-03-13.

Sources

  1. Retraction Watch: As a nonsense phrase of shady provenance makes the rounds, Elsevier defends its use (news, 2025-02)
    https://retractionwatch.com/2025/02/10/vegetative-electron-microscopy-fingerprint-paper-mill/ (opens in new tab)
  2. The Conversation: A weird phrase is plaguing scientific papers — and we traced it back to a glitch in AI training data (news, 2025-03)
    https://theconversation.com/a-weird-phrase-is-plaguing-scientific-papers-and-we-traced-it-back-to-a-glitch-in-ai-training-data-254463 (opens in new tab)
  3. Gizmodo: A Scanning Error Created a Fake Science Term — Now AI Won't Let It Die (news, 2025-02)
    https://gizmodo.com/a-scanning-error-created-a-fake-science-term-now-ai-wont-let-it-die-2000590659 (opens in new tab)

Update Log

  • — First logged (Status: Confirmed, Evidence: Corroborated)