Foundation Model

Definition

A foundation model is a large-scale machine learning model trained on broad, diverse datasets using self-supervised or semi-supervised learning, designed to be adapted to a wide range of downstream tasks. The term, introduced by Stanford’s Center for Research on Foundation Models in 2021, encompasses large language models, vision transformers, multimodal models, and other architectures that serve as general-purpose bases for specialised applications. Foundation models are characterised by their scale, generality, and the emergent capabilities that arise from training on massive corpora. They form the infrastructure layer upon which applications including chatbots, code assistants, image generators, and autonomous agents are built.

How It Relates to AI Threats

Foundation models are relevant across multiple threat domains due to their role as shared infrastructure. Within Information Integrity, their generative capabilities enable misinformation and hallucinated content at scale. Within Security & Cyber, vulnerabilities in a foundation model — such as data poisoning during pre-training — propagate to every downstream application built upon it. Within Systemic & Catastrophic, the concentration of AI capabilities in a small number of foundation models creates infrastructure dependency, where a failure, compromise, or policy change by a single model provider can cascade across thousands of applications and organisations simultaneously.

Why It Occurs

Self-supervised learning on internet-scale datasets enables models to acquire broad capabilities without task-specific labelling
Scaling laws demonstrate predictable capability improvements with increased compute, data, and parameters
The economic efficiency of adapting a single pre-trained model to many tasks drives industry convergence around foundation model architectures
Competitive pressure among a small number of organisations accelerates capability development
Regulatory frameworks have not yet established comprehensive requirements for foundation model providers, though the EU AI Act has begun to address this

Real-World Context

Foundation models underpin the AI systems involved in multiple documented incidents. The Samsung data leak (INC-23-0002) occurred through employee interaction with a foundation model-based chatbot. Italy’s temporary ban (INC-23-0003) targeted a foundation model deployment on data protection grounds. The hallucinated legal citations case (INC-23-0005) demonstrated failure modes inherent to foundation model architectures. The EU AI Act includes specific provisions for “general-purpose AI models,” recognising foundation models as a distinct regulatory category requiring transparency obligations and, for models posing systemic risk, additional safety evaluations.

Definition

How It Relates to AI Threats

Why It Occurs

Real-World Context

Related Incidents

Related Threat Patterns

Related Terms