Theory of Change
AISI's founding mission: "minimise surprise to the UK and humanity from rapid and unexpected advances in AI." Chief Scientist Geoffrey Irving describes two core functions: "(1) a channel for information flowing to government and governments plural about risks from frontier AI... and (2) actually mitigate the problem by working on both AI developer side mitigations and non-model mitigations."
The causal chain: AISI evaluates frontier models, identifies risks, shares findings with labs (who patch vulnerabilities) and with governments (who can act on the intelligence). Irving frames a deeper purpose: AISI exists to "enable coordination towards safer actions" because "market forces mean that there is constant pressure to prioritize speed over caution."
Irving on the current safety approach: "You're not going to get to a lot of nines with the current technology... all of the approaches we have now look like they're empirical. Maybe they'll go through... all these have correlated potential failures where they could in fact all fail for the same essential reason." He declined to give a probability estimate but acknowledged "loss of control we view it as a potential catastrophic risk." A government chief scientist publicly stating the current approach may yield only "a couple of nines" of reliability is extraordinary candor.
What They Do
Frontier model evaluation. Tested 30+ models since November 2023, including pre-deployment evaluations of Claude 3.5 Sonnet, OpenAI o1, and others. Evaluations span cyber, chem-bio, persuasion, autonomy, and safeguards. The red team has a 100% success rate finding jailbreaks against every model tested. In the best-defended domain (bio), effort required to jailbreak increased 40x between two model versions released six months apart -- defenses improving but never unbreakable. Irving emphasized that jailbreaking techniques transfer across models but specific jailbreaks do not: "you'd have to search again for the next model."
Frontier AI Trends Report. AISI's flagship public output, synthesizing two years of testing. Key findings: cyber capabilities doubling every 8 months, models surpassing PhD experts in biology (including troubleshooting from photos of experimental setups), self-replication success rates climbing from 5% to 60% (2023-2025), universal jailbreaks found for every system tested. Irving cautioned against the common narrative that RL only works for verifiable domains: "the RL models are in fact way better at [non-verifiable tasks]... it's also because we did RL on fuzzier stuff" -- suggesting capabilities are advancing faster than many expect.
Open-source tools. Inspect (evaluation framework, 100+ pre-built evaluations, 50+ contributors) has become the standard tool for AI testing across governments and businesses. Also released ControlArena (AI control experiments), RepliBench (self-replication benchmark), and the recent boundary point jailbreaking method.
Research and publication. 10 papers at NeurIPS 2025. Persuasion study published in Science (77,000 participants, 91,000 dialogues -- finding AI chatbots 41-52% more persuasive than static messages). Active researchers posting on LessWrong/Alignment Forum on eval awareness, alignment honeypots, untrusted monitoring.
Grant-making. Alignment Project: GBP 27M, 60 grantees from 800+ applications across 42 countries. Advisory board chaired by Yoshua Bengio with Buck Shlegeris (Redwood Research). Systemic Safety Grants: GBP 8.5M. Challenge Fund: GBP 5M. Total: ~GBP 40M. Irving is explicitly trying to attract theorists from complexity theory, information theory, game theory, and cognitive science -- fields that "have a bunch of models" but "are just beginning to take AI seriously."
International coordination. First bilateral government evaluation of a frontier model (OpenAI o1, with US AISI). Hosts secretariat for International AI Safety Report (Bengio-chaired, 100 experts, 30 countries). Founding member of International Network of AISIs. San Francisco office opened May 2024 for proximity to US labs.
Key People
Ian Hogarth CBE, Chair. Tech investor, co-founded Songkick. Invested in 50+ AI companies including Anthropic (divested upon taking role at June 2023 prices, before subsequent appreciation). His April 2023 FT essay "We must slow down the race to God-like AI" -- which called AI "the most important thing happening in the world" and proposed a CERN-like containment facility -- led directly to his appointment. Continues as General Partner at Plural VC.
Geoffrey Irving, Chief Scientist. PhD CS. Led alignment teams at both OpenAI (Reflection Team, co-authored original RLHF paper) and DeepMind (Scalable Alignment). Joined April 2024. Left DeepMind because he worried "progress on safety is too slow relative to the rate of advancement in AI" and found it "a personal relief to be doing work I'm confident is good overall" -- implying his lab work may have been net negative. His alignment research priorities include debate, formal methods, and understanding long-horizon agent dynamics.
Jade Leung, CTO and PM's AI Adviser. Rhodes Scholar, DPhil Oxford AI governance. Co-founded GovAI with Allan Dafoe. Former OpenAI governance lead. Took a major pay cut to join AISI. Married to Markus Anderljung (GovAI Director of Policy and Research). Only 31.
Team: ~100 technical staff, ~250 total. Leadership transition in October 2025: founding director Oliver Ilott promoted to DG for AI at DSIT, replaced by Adam Beaumont (ex-GCHQ Chief AI Officer) -- a shift from policy to national security background that mirrors the safety-to-security rebrand.
Money and Incentives
Funding is 100% UK government (HM Treasury via DSIT). No lab money, no philanthropic strings, no equity holders. This is structurally unique among major AI safety organizations.
- Initial investment: GBP 100M (2023, Frontier AI Taskforce)
- Annual budget: ~GBP 50M ($65M/year), approximately 10x the US AISI/CAISI budget
- Spending Review 2025: GBP 240M for 2026-2030 (~GBP 60M/year average)
- Grant programs: ~GBP 40M disbursed (Alignment Project + Systemic Safety + Challenge Fund)
- Alignment Project receives in-kind lab contributions: GBP 5M compute from AWS, GBP 5.6M from OpenAI, plus Anthropic and Microsoft as partners
Budget breakdown is not publicly available. How GBP 50M splits between staff (~250 people), compute (likely via Isambard-AI supercomputer plus AWS credits), grants (~GBP 40M), and operations is undisclosed.
Incentive dynamics. Pure government funding eliminates financial conflicts of interest but creates total dependence on political will. The UK government wants AI to "unleash growth" (Peter Kyle's explicit framing at the Munich Security Conference where the rebrand was announced). TIME reported the key tension: "it was far more important to keep the labs friendly and collaborative, officials believed, than to antagonize them and risk torpedoing the access to models upon which the AISI relied to do its job."
Hogarth's conflict mitigations. Divested Anthropic, Conjecture, Lakera, Faculty, and Stability AI holdings at June 2023 prices (giving up significant paper gains from subsequent appreciation). Recuses from procurement decisions involving portfolio companies. Spouse is CEO of Supercritical (carbon removal). But Plural VC continues investing in AI-adjacent companies.
What Others Say
"Beyond Benchmarks" (arXiv paper): "The inability to generalize from evaluation data to real-world applications undermines the very foundation of ex-ante AI regulation." Argues that deep learning lacks the causal world models that make scientific regulation possible in other domains. Unlike a crash test, where physics lets you extrapolate from one speed to another, there is no theory linking AI benchmark performance to real-world behavior.
Gabriel Alfour (ControlAI): "The Theory of Change behind Evals is broken... Evals only make sense in the presence of regulations which do not exist, and they crowd out effort at passing such regulations... Evals move the burden of proof away from AI Corporations."
Ada Lovelace Institute: "A testing regime is only meaningful with pre-market approval powers underpinned by statute." Identified three failures: technical limitations of benchmarking, bait-and-switch between evaluated and deployed models, and voluntary framework fraying (3 of 4 labs non-compliant with pre-deployment access commitments).
BRAID UK: AISI pursues a "narrowly technical approach" that ignores social, humanistic, and ethical dimensions. Historical parallel: crash test dummies designed for male bodies failed women because safety was treated as purely technical.
TIME Magazine: "Can the fledgling AI Safety Institute really hold billion-dollar tech giants accountable?" and "Without any legal ability to compel labs to act, the AISI could be seen -- from one angle -- as a taxpayer-funded helper to several multibillion-dollar companies."
Irving himself (from inside the organization): "The science of evaluations is not strong enough that we can confidently rule out all risks from doing these evaluations... some of those experiments, at least with the current level of access, can only be conducted at the labs." When the chief scientist acknowledges the limits this frankly, it is more revealing than any external critique.
What's Absent
- No enforcement actions. AISI has never publicly recommended that a lab delay or modify a deployment. All interactions remain behind closed doors. If AISI has influenced deployment decisions, the public cannot verify it.
- No statutory authority despite two years of operation and a change of government. Labour promised "binding regulation on the handful of companies developing the most powerful AI models" but has not delivered an AI Bill.
- Non-compliant labs unnamed. "Three of four" major labs failed to provide pre-deployment access, but which companies, which models, and what circumstances are not disclosed.
- No staff morale data post-rebrand. The February 2025 rebrand from "AI Safety Institute" to "AI Security Institute" was widely criticized, but no departures in protest are known. Vocabulary analysis shows systematic removal of terms like "bias," "discrimination," and "accountability" from AISI materials.
- No open-source model governance strategy. The Frontier Trends Report shows the capability gap between proprietary and open-source models has narrowed to 4-8 months. Irving acknowledged this as "certainly a concern" but no policy exists.
- No independent oversight of AISI itself. No external audit of evaluation methodology, no parliamentary committee with ongoing oversight, no review of whether the voluntary approach is working.
- No FOIA transparency. AISI is reportedly not subject to the same FOIA obligations as other government bodies. Evaluation results shared with labs are not available to the public.
Recommended Reading
Geoffrey Irving on Cognitive Revolution podcast (March 2026, 2hr+) -- The most candid window into how AISI's chief scientist actually thinks about AI risk. Irving speaks with remarkable honesty about correlated failures, the limits of current approaches, debate and formal methods, and why he left DeepMind. Essential for understanding what an exceptionally capable government scientist actually believes versus what he can say officially. Listen here
"Why AI Evaluation Regimes are bad" by Gabriel Alfour (LessWrong) -- The strongest argument that AISI's core activity actively harms safety by legitimizing deployment without enforcement. A provocative but logically coherent challenge. Read on LessWrong
"Beyond Benchmarks: On The False Promise of AI Regulation" (arXiv) -- The conceptual critique arguing benchmarks fundamentally cannot deliver what regulation requires. Read on arXiv
Ian Hogarth, "We must slow down the race to God-like AI" (Financial Times, April 2023) -- The founding worldview. Essential for understanding the gap between AISI's original vision and current reality. FT article
"Safety first?" by Ada Lovelace Institute -- The clearest institutional critique of AISI's voluntary framework and the bait-and-switch problem. Read here