Evasion Attack

Definition

An evasion attack is a class of adversarial attack performed at inference time against a deployed machine learning model. The attacker crafts inputs that are subtly modified to cause the model to produce incorrect outputs — typically misclassifying malicious inputs as benign. Unlike data poisoning attacks that target the training phase, evasion attacks operate against the model as it exists in production. Perturbations may be imperceptible to human reviewers yet sufficient to cross the model’s decision boundary. Evasion attacks are applicable to image classifiers, malware detectors, spam filters, intrusion detection systems, and any AI model that makes binary or categorical security decisions on incoming data.

How It Relates to AI Threats

Evasion attacks are a primary concern within the Security and Cyber Threats domain. Under the adversarial evasion sub-category, these attacks directly undermine AI-powered security infrastructure by enabling malicious payloads to pass through automated defenses undetected. As organizations increasingly rely on AI for threat detection — including antivirus engines, network intrusion detection, and content moderation systems — evasion techniques provide attackers with a systematic method to neutralize those defenses. The threat is compounded by transferability: adversarial examples crafted against one model frequently succeed against other models trained on similar data, enabling attackers to bypass defenses they cannot directly access.

Why It Occurs

AI security models learn decision boundaries that differ from human judgment, creating exploitable mathematical blind spots
Attackers can iteratively probe deployed models to map their classification thresholds without needing internal access
Perturbation techniques developed in academic research are freely available and adaptable to real-world attack scenarios
Defenders must achieve correct classification on all inputs while attackers need only one successful bypass per campaign
Model updates and retraining cycles create windows during which new evasion techniques remain effective against outdated defenses

Real-World Context

Evasion attacks have been demonstrated against commercial malware detection systems, where small modifications to executable files cause AI-based scanners to classify them as benign. Research has shown that adversarial perturbations can defeat image-based CAPTCHA systems, traffic sign classifiers in autonomous vehicles, and AI-powered content moderation filters. The incident catalogued as INC-25-0001, involving AI-orchestrated cyber espionage, illustrates the broader context in which evasion techniques enable sophisticated threat actors to bypass automated defenses. NIST and the EU AI Act have both identified adversarial robustness as a core requirement for high-risk AI deployments.

Definition

How It Relates to AI Threats

Why It Occurs

Real-World Context

Related Threat Patterns

Related Terms