AI Threat Glossary
Definitions of 148 key terms used across eight threat domains.
A
- Accountability — The principle that identifiable individuals or organisations must be answerable for AI system outcomes, including harms caused by automated decisions.
- Adversarial Attack — A deliberate manipulation of inputs to a machine learning model designed to cause incorrect outputs, misclassifications, or security bypasses. Adversarial attacks exploit mathematical vulnerabilities in how models process data rather than flaws in traditional software logic.
- Agent Propagation — The spread of errors, hallucinations, or adversarial inputs from one AI agent to others in connected multi-agent systems, potentially causing cascading failures.
- Agent Safety — The field of ensuring AI agents operate within intended boundaries and do not cause unintended harm through autonomous actions, tool use, or goal pursuit.
- Agentic AI — AI systems that autonomously plan and execute multi-step actions with minimal human oversight.
- AI-Generated Code — Code produced by AI systems, which can be used for both legitimate software development and malicious purposes including malware creation and vulnerability exploitation.
- Alert Fatigue — Desensitisation of human operators to system warnings due to excessive or poorly calibrated alerts, reducing the effectiveness of human oversight over AI systems.
- Algorithmic Amplification — The process by which recommendation algorithms and content curation systems disproportionately promote certain content, amplifying its reach and societal impact beyond organic levels.
- Algorithmic Bias — Systematic errors in AI systems that produce unfair outcomes, often favouring one group over another.
- Algorithmic Trading — The use of AI algorithms to execute financial trades at speeds and volumes exceeding human capability, introducing systemic risks including flash crashes and market manipulation.
- Alignment — The property of an AI system whose objectives, decision-making processes, and behaviours remain consistent with human values, intentions, and safety requirements. Alignment is a foundational challenge in AI safety research.
- Allocational Harm — Unfair distribution of resources, opportunities, or services when AI systems systematically disadvantage certain groups in consequential decisions such as hiring, lending, or housing.
- Anonymization — The process of removing or obscuring personally identifiable information from datasets to protect individual privacy, which AI techniques can increasingly defeat through re-identification attacks.
- Artificial General Intelligence (AGI) — A hypothetical AI system capable of performing any intellectual task that a human can, with the ability to transfer learning across domains without task-specific programming.
- Attribute Inference — Using AI to deduce sensitive personal characteristics such as health status, political affiliation, or sexual orientation from seemingly innocuous data patterns.
- Authority Transfer — The gradual, often unrecognised shift of decision-making power from humans to AI systems, eroding meaningful human control over consequential outcomes.
- Automated Decision-Making — Using algorithms or AI to make decisions affecting individuals with limited human review.
- Automated Exploit — AI-driven tools that automatically discover and exploit software vulnerabilities without human intervention, accelerating the pace and scale of cyber attacks.
- Automated Vulnerability Discovery — Using AI to autonomously identify security weaknesses in software, networks, or systems.
- Automation — The use of AI to perform tasks previously requiring human labour, spanning physical, cognitive, and creative work, with implications for employment and economic structures.
- Automation Bias — The tendency to favour automated system outputs over independent human judgement, even when incorrect.
- Autonomous Vehicle — A vehicle using AI to navigate and operate without direct human control.
- Autonomous Weapons — Weapon systems that use artificial intelligence to select and engage targets without meaningful human control over the critical functions of target identification, tracking, and engagement.
- Autonomy — The capacity of individuals to make self-directed decisions free from undue external influence or automated override, which AI systems can undermine through manipulation or substitution.
B
- Backdoor Attack — A covert modification to an AI model during training that causes targeted misclassification or malicious behaviour when a specific trigger pattern is present in the input.
- Behavioral Profiling — The systematic collection and analysis of individual behaviour patterns by AI systems to predict preferences, intentions, or future actions, often without informed consent.
- Biological Threat — The risk of AI systems being used to design, enhance, or disseminate biological agents capable of causing widespread harm to human health or ecosystems.
- Biometric Data — Measurable physical or behavioural characteristics used to identify or authenticate individuals.
- Biosecurity — The set of measures, policies, and practices designed to protect against biological threats, including the prevention of AI-enabled acceleration of pathogen design, synthesis, or dissemination of dangerous biological knowledge.
- Black-Box System — An AI system whose internal decision-making processes are opaque or incomprehensible to users, operators, and auditors, making accountability and error correction difficult.
- Business Email Compromise — Targeted fraud impersonating executives or trusted contacts to authorise fraudulent transactions.
C
- Cascading Failure — A process in which the failure of one component in an interconnected system triggers a sequence of failures in dependent components, potentially leading to the collapse of an entire system or network of systems.
- Complacency — A state of reduced vigilance in human operators who develop excessive trust in AI system reliability, leading to failures in oversight and error detection.
- Confabulation — The generation of plausible but factually incorrect information by AI systems, presented with unwarranted confidence.
- Consent — The principle that individuals should provide informed, voluntary agreement before their data is collected or processed by AI systems.
- Contagion — The spread of harmful outputs, compromised states, or adversarial inputs between connected AI agents.
- Content Authenticity — Standards and technologies for verifying the origin, integrity, and editing history of digital media.
- Context Injection — Manipulating an AI agent's context window or retrieved information to influence its reasoning and outputs.
- Coordinated Inauthentic Behavior — Organised networks of fake or compromised accounts using AI to simulate grassroots activity and manipulate public discourse.
- Coordination Failure — When multiple AI agents working toward shared objectives produce unintended or harmful outcomes due to misaligned strategies.
- Cyber Espionage — Covert digital intrusion to access and exfiltrate sensitive data, increasingly augmented by AI.
D
- Dark Pattern — A deceptive user interface design that manipulates individuals into making decisions they would not otherwise make, increasingly amplified by AI-driven personalisation.
- Data Bias — Systematic errors in training datasets that reflect historical inequities, leading to discriminatory AI outputs.
- Data Concentration — The accumulation of vast datasets by a small number of organisations, creating asymmetric advantages and barriers to competition.
- Data Extraction — Techniques for recovering private training data or sensitive information from AI models through systematic querying.
- Data Leakage — Unintended exposure of sensitive or personal data, including through AI system inputs or outputs.
- Data Poisoning — The deliberate corruption or manipulation of training data used to build machine learning models, causing them to learn incorrect patterns, produce biased outputs, or contain hidden backdoors exploitable by an attacker.
- Data Protection — Legal and technical frameworks governing collection, processing, and sharing of personal data.
- Decision Loop — An automated cycle where AI systems make decisions, observe outcomes, and adjust subsequent decisions without human intervention.
- Deepfake — AI-generated synthetic media that convincingly replicates the appearance, voice, or actions of real individuals.
- Democratic Integrity — The preservation of fair, transparent, and trustworthy democratic processes against AI-enabled manipulation and erosion.
- Deskilling — The reduction of human workers' skills, expertise, and professional judgment as AI systems assume complex cognitive tasks.
- Differential Privacy — A mathematical framework that provides measurable privacy guarantees by adding calibrated noise to data or query results, limiting what can be inferred about any individual.
- Digital Monopoly — Market dominance achieved through control of AI infrastructure, data assets, or foundational models.
- Disinformation — Deliberately false or misleading information created and spread to deceive, manipulate opinion, or cause harm.
- Disparate Impact — When an AI system produces significantly different outcomes for different demographic groups, regardless of intent.
- Dual-Use — A characteristic of technologies, tools, or knowledge developed for beneficial purposes that can also be repurposed or exploited for harmful applications, a concept with particular relevance to AI capabilities in cybersecurity, biology, and information manipulation.
E
- Elder Fraud — Financial crimes targeting older adults, increasingly enabled by AI voice cloning, deepfakes, and automated robocalls.
- Election Interference — Deliberate efforts to influence democratic elections through disinformation, voter suppression, or manipulation of public discourse.
- Emergent Behavior — Unpredicted behaviors arising in AI systems from the interaction of simpler components, not explicitly programmed.
- Engagement Optimization — AI-driven maximisation of user attention and interaction, often at the expense of content quality and user wellbeing.
- Epistemic Crisis — A societal condition where shared frameworks for establishing truth and knowledge break down.
- Erasure — The systematic invisibility or underrepresentation of certain groups in AI training data, model outputs, or system design, leading to the denial of recognition, resources, or participation.
- Evasion Attack — Adversarial inputs crafted to cause a deployed AI model to misclassify or fail to detect malicious content, allowing threats to bypass automated defenses.
- Existential Risk — A risk threatening humanity's long-term survival, in AI contexts linked to unaligned superintelligent systems.
- Explainability — The degree to which an AI system's decision-making process can be understood and interpreted by humans, enabling accountability, trust, and regulatory compliance.
F
- Facial Recognition — AI technology that identifies or verifies individuals by analysing facial features, with significant privacy and bias concerns.
- Fairness — The principle that AI systems should produce equitable outcomes across individuals and groups, encompassing multiple competing mathematical definitions and sociotechnical considerations.
- Feedback Loop — A cycle where AI system outputs influence the data used for future training or decisions, potentially amplifying biases, errors, or unintended patterns over successive iterations.
- Foundation Model — A large-scale AI model trained on broad data that can be adapted to a wide range of downstream tasks through fine-tuning or prompting.
G
- GDPR — The EU's General Data Protection Regulation establishing comprehensive rules for personal data processing and storage.
- Goal Drift — The gradual divergence of an AI agent's effective objectives from its originally specified goals during extended autonomous operation, resulting in behavior that no longer aligns with its operators' intentions.
- Goodhart's Law — The principle that when a measure becomes a target, it ceases to be a good measure — applied to AI systems, it explains why agents that optimize a proxy metric often fail to achieve the intended objective.
- Governance — The frameworks, policies, and institutions through which AI systems are regulated, overseen, and held accountable across their lifecycle from development through deployment and retirement.
- Grandparent Scam — A social engineering fraud using AI voice cloning to impersonate a grandchild and convince older adults to send money.
- Guardrail — A safety mechanism — implemented through training constraints, input/output filters, or system-level rules — that restricts an AI system's behavior to prevent harmful, policy-violating, or unintended outputs.
H
- Hallucination — The generation of confident but factually incorrect or fabricated output by a language model, including invented citations.
- Human Agency — The capacity of individuals to make autonomous, informed decisions and exercise meaningful control over actions that affect their lives, increasingly at risk as AI systems assume decision-making authority.
- Human-in-the-Loop — A design principle requiring meaningful human oversight and intervention at critical decision points in AI-driven processes.
I
- Information Ecosystem — The interconnected network of media, platforms, institutions, and individuals through which information is created, distributed, consumed, and verified within a society.
- Information Integrity — The trustworthiness, accuracy, and reliability of information within digital systems and public discourse, encompassing both the factual correctness of content and the authenticity of its provenance.
- Infrastructure Dependency — Critical reliance of essential services on shared AI systems, creating vulnerability to widespread failure if those systems malfunction, degrade, or become unavailable.
- Institutional Trust — Public confidence in the reliability, competence, and good faith of societal institutions including government, media, scientific bodies, and the judiciary, which AI-enabled threats can systematically erode.
- International Humanitarian Law — The body of international law governing armed conflict, including rules on distinction, proportionality, and precaution, whose application to AI-enabled weapons systems raises fundamental questions of compliance and accountability.
J
- Jailbreak Attack — A technique that circumvents an AI model's built-in safety alignment and content policies to elicit restricted or harmful outputs.
- Job Displacement — The elimination, significant degradation, or structural transformation of human employment as AI-driven automation replaces tasks, roles, or entire occupational categories previously performed by workers.
L
- Large Language Model — A neural network trained on massive text datasets to generate, summarise, and reason about natural language.
- Lethal Autonomous Weapon Systems (LAWS) — Weapons systems that can independently select and engage targets without meaningful human control over individual attack decisions, raising fundamental legal, ethical, and security concerns.
M
- Malware — Malicious software designed to infiltrate, damage, or gain unauthorized access to computer systems. In the context of AI threats, malware increasingly leverages machine learning to evade detection, adapt to defenses, and automate attack strategies.
- Manipulative Design — Interface patterns that exploit cognitive biases and AI personalisation to steer user behaviour against their interests, undermining informed consent and autonomous decision-making.
- Market Manipulation — The use of AI systems to artificially influence the price, volume, or conditions of financial markets through algorithmic trading strategies, coordinated information campaigns, or exploitation of market microstructure vulnerabilities.
- Market Power — The ability of dominant AI firms to control market conditions, pricing, and access to essential AI infrastructure and data, concentrating economic influence in ways that limit competition and innovation.
- Mass Surveillance — Broad, indiscriminate monitoring of populations using AI technologies such as facial recognition and communications interception.
- Media Manipulation — The deliberate alteration or fabrication of media content using AI to deceive, mislead, or influence public perception, encompassing deepfakes, synthetic text, and manipulated imagery.
- Membership Inference — An attack technique that determines whether a specific data record was included in an AI model's training dataset, potentially revealing sensitive information about individuals whose data was used.
- Memory Poisoning — The deliberate corruption of an AI agent's persistent memory, context window, or stored state to manipulate its future decisions, outputs, or behavior without the agent or its operators detecting the alteration.
- Misalignment — A condition in which an AI system's operational behaviour diverges from the objectives, values, or intentions specified by its designers, potentially causing unintended harm at varying scales.
- Misinformation — False or inaccurate information spread without deliberate intent to deceive, distinct from disinformation which involves intentional deception. AI-generated hallucinations represent a major and growing source.
- Model Inversion — An attack technique that reconstructs private or sensitive information from a machine learning model's training data by systematically analyzing the model's outputs, predictions, or confidence scores.
- Model Provenance — The documented chain of custody for an AI model — tracing its origin, training data, fine-tuning history, and distribution path to verify integrity and authenticity.
- Multi-Agent System — A computational architecture in which multiple autonomous AI agents interact, cooperate, or compete to accomplish tasks. These systems introduce emergent risks from coordination failures, conflicting objectives, and cascading errors between agents.
N
O
P
- Persistent Memory — The capacity of AI agents to retain and recall information across interactions, enabling continuity of context but creating new attack surfaces for data poisoning and unauthorized knowledge accumulation.
- Persuasive Technology — Systems designed to change user attitudes or behaviours through AI-powered personalisation, nudging, and emotional targeting, raising concerns about autonomy and informed consent.
- Phishing — A social engineering attack using fraudulent messages to trick recipients into revealing credentials, installing malware, or transferring funds.
- Polymorphic Malware — Malicious software that uses AI to continuously alter its code signature while maintaining functionality, evading detection by signature-based and AI-powered security systems.
- Price Fixing — AI-facilitated coordination of pricing among competitors, whether through explicit collusion or emergent algorithmic convergence that produces cartel-like outcomes without direct human agreement.
- Privilege Escalation — The exploitation of a system vulnerability or misconfiguration to gain elevated access rights beyond those originally authorized. In AI contexts, this includes AI agents acquiring capabilities or permissions that exceed their intended operational boundaries.
- Profiling — The automated processing of personal data to evaluate, categorise, or predict individual characteristics and behaviour, enabling targeted decisions that may affect rights and opportunities.
- Prompt Injection — An attack that inserts adversarial instructions into an AI model's input to override its intended behaviour, bypass safety constraints, or extract restricted information.
- Propaganda — Deliberately crafted messaging designed to influence public opinion, now amplified by AI-generated content and automated distribution at unprecedented speed and scale.
- Protected Characteristics — Legally defined attributes such as race, gender, age, disability, and religion that anti-discrimination law prohibits as bases for adverse treatment in decisions affecting individuals.
- Proxy Discrimination — A form of algorithmic discrimination where AI systems use ostensibly neutral variables that correlate with protected characteristics, producing biased outcomes without explicitly referencing protected attributes.
- Proxy Variable — A data attribute that correlates with a protected characteristic, enabling indirect algorithmic discrimination even when the protected attribute is excluded.
- Pseudonymization — Replacing direct identifiers in datasets with artificial identifiers while maintaining data utility, a privacy-enhancing technique required by GDPR but vulnerable to AI-powered re-identification.
R
- Re-Identification — The process of linking supposedly anonymised or de-identified data back to specific individuals, a capability dramatically enhanced by AI techniques that can cross-reference diverse data sources.
- Recommendation System — AI systems that suggest content, products, or actions to users based on predicted preferences, shaping information exposure and individual choices at scale.
- Recursive Self-Improvement — A theoretical AI capability in which a system iteratively enhances its own architecture or reasoning, potentially leading to rapid capability gains.
- Red Teaming — Structured adversarial testing of AI systems to identify vulnerabilities, safety failures, and harmful capabilities before deployment.
- Representation Gap — Significant disparities between groups in training data coverage, leading to AI systems that perform poorly or produce biased outcomes for underrepresented populations.
- Representational Harm — Harm that occurs when AI systems reinforce stereotypes, erase identities, or demean social groups through biased outputs, even in the absence of direct material consequences.
- Retrieval-Augmented Generation (RAG) — An architecture that enhances language model responses by retrieving relevant documents from external knowledge bases and including them in the model's context window alongside the user's query.
- Reward Hacking — When an AI agent finds unintended ways to maximise its reward signal that satisfy the formal objective but violate the designer's actual intent, exploiting gaps between specified and intended goals.
- RLHF (Reinforcement Learning from Human Feedback) — A training technique that aligns language model behavior with human preferences by using human evaluators to rank model outputs, then training the model to prefer higher-ranked responses.
- Robocall — An automated telephone call delivering a pre-recorded or AI-synthesised message, increasingly used in fraud, scams, and disinformation campaigns.
- Robustness — The ability of an AI system to maintain correct and reliable performance when faced with adversarial inputs, distribution shifts, or unexpected operating conditions.
S
- Safety-Critical — Systems where AI failure could result in death, serious injury, or significant environmental damage, requiring the highest standards of testing, oversight, and human control.
- Self-Determination — The right and capacity of individuals to make meaningful choices about their own lives without undue influence or constraint from automated systems.
- Sensitive Data — Personal information revealing racial origin, political opinions, health status, sexual orientation, or other characteristics that require heightened protection under data protection law.
- Single Point of Failure — A component whose failure causes an entire system to stop functioning, particularly concerning when AI systems or their underlying infrastructure become critical dependencies without adequate redundancy.
- Smishing — A phishing attack conducted via SMS text messages, often using AI to generate convincing, contextually relevant lures.
- Social Engineering — Psychological manipulation techniques that exploit human trust, authority, and urgency to trick individuals into revealing credentials, authorizing transactions, or granting system access.
- Social Scoring — AI systems that assign scores to individuals based on behaviour, social connections, or personal characteristics, used to determine access to services, opportunities, or freedoms.
- Stereotyping — AI systems reproducing or amplifying oversimplified, generalised characterisations of social groups in their outputs, reinforcing harmful preconceptions at scale.
- Superintelligence — A hypothetical AI system that surpasses human cognitive ability across virtually all domains, including reasoning, planning, and social intelligence.
- Supply Chain Attack — An attack that compromises a system by tampering with upstream components — model weights, datasets, software packages, or tool configurations — before they reach the deploying organization.
- Synthetic Media — Media content — video, audio, images, or text — wholly or partially generated or manipulated by AI.
- System Prompt — A set of instructions provided to a language model by the application developer that defines the model's role, behavior constraints, and operational context — distinct from user input but processed in the same token stream.
- Systemic Risk — The risk that failure, disruption, or unintended behaviour in one component of the AI ecosystem propagates across interconnected systems and institutions, causing widespread harm that exceeds the sum of individual failures.
T
- Tracking — Continuous monitoring of individual location, activity, or digital behaviour by AI systems, often conducted without meaningful consent or awareness.
- Training Data — The datasets used to train machine learning models, whose quality and representativeness directly influence model behaviour, biases, and harms.
- Trust Erosion — The cumulative degradation of public confidence in institutions, media, information systems, and shared epistemic frameworks, accelerated by the proliferation of AI-generated synthetic content and automated manipulation.
V
- Vendor Lock-In — Dependency on a single AI provider's proprietary models, tools, or infrastructure that creates prohibitively high switching costs and reduces organisational autonomy.
- Vishing — Voice phishing -- a social engineering attack via telephone, increasingly using AI voice cloning to impersonate trusted individuals.
- Voice Cloning — AI technology that replicates a specific individual's voice to generate realistic synthetic speech.
- Vulnerability Discovery — The use of AI to automatically identify security weaknesses in software, networks, or systems, a dual-use capability that serves both defenders and attackers.