UK AI Security Institute (AISI)

Governance

Government evaluator. Pre-deployment testing.

Founded: 2023
HQ: London, UK
Team: 250
Structure: government_directorate (UK)
Model: Government Funded

Theory of Change

AISI's founding mission: "minimise surprise to the UK and humanity from rapid and unexpected advances in AI." Chief Scientist Geoffrey Irving describes two core functions: "(1) a channel for information flowing to government and governments plural about risks from frontier AI... and (2) actually mitigate the problem by working on both AI developer side mitigations and non-model mitigations."

The causal chain: AISI evaluates frontier models, identifies risks, shares findings with labs (who patch vulnerabilities) and with governments (who can act on the intelligence). Irving frames a deeper purpose: AISI exists to "enable coordination towards safer actions" because "market forces mean that there is constant pressure to prioritize speed over caution."

Irving on the current safety approach: "You're not going to get to a lot of nines with the current technology... all of the approaches we have now look like they're empirical. Maybe they'll go through... all these have correlated potential failures where they could in fact all fail for the same essential reason." He declined to give a probability estimate but acknowledged "loss of control we view it as a potential catastrophic risk." A government chief scientist publicly stating the current approach may yield only "a couple of nines" of reliability is extraordinary candor.

What They Do

Frontier model evaluation. Tested 30+ models since November 2023, including pre-deployment evaluations of Claude 3.5 Sonnet, OpenAI o1, and others. Evaluations span cyber, chem-bio, persuasion, autonomy, and safeguards. The red team has a 100% success rate finding jailbreaks against every model tested. In the best-defended domain (bio), effort required to jailbreak increased 40x between two model versions released six months apart -- defenses improving but never unbreakable. Irving emphasized that jailbreaking techniques transfer across models but specific jailbreaks do not: "you'd have to search again for the next model."

Frontier AI Trends Report. AISI's flagship public output, synthesizing two years of testing. Key findings: cyber capabilities doubling every 8 months, models surpassing PhD experts in biology (including troubleshooting from photos of experimental setups), self-replication success rates climbing from 5% to 60% (2023-2025), universal jailbreaks found for every system tested. Irving cautioned against the common narrative that RL only works for verifiable domains: "the RL models are in fact way better at [non-verifiable tasks]... it's also because we did RL on fuzzier stuff" -- suggesting capabilities are advancing faster than many expect.

Open-source tools. Inspect (evaluation framework, 100+ pre-built evaluations, 50+ contributors) has become the standard tool for AI testing across governments and businesses. Also released ControlArena (AI control experiments), RepliBench (self-replication benchmark), and the recent boundary point jailbreaking method.

Research and publication. 10 papers at NeurIPS 2025. Persuasion study published in Science (77,000 participants, 91,000 dialogues -- finding AI chatbots 41-52% more persuasive than static messages). Active researchers posting on LessWrong/Alignment Forum on eval awareness, alignment honeypots, untrusted monitoring.

Grant-making. Alignment Project: GBP 27M, 60 grantees from 800+ applications across 42 countries. Advisory board chaired by Yoshua Bengio with Buck Shlegeris (Redwood Research). Systemic Safety Grants: GBP 8.5M. Challenge Fund: GBP 5M. Total: ~GBP 40M. Irving is explicitly trying to attract theorists from complexity theory, information theory, game theory, and cognitive science -- fields that "have a bunch of models" but "are just beginning to take AI seriously."

International coordination. First bilateral government evaluation of a frontier model (OpenAI o1, with US AISI). Hosts secretariat for International AI Safety Report (Bengio-chaired, 100 experts, 30 countries). Founding member of International Network of AISIs. San Francisco office opened May 2024 for proximity to US labs.

Key People

Ian Hogarth CBE, Chair. Tech investor, co-founded Songkick. Invested in 50+ AI companies including Anthropic (divested upon taking role at June 2023 prices, before subsequent appreciation). His April 2023 FT essay "We must slow down the race to God-like AI" -- which called AI "the most important thing happening in the world" and proposed a CERN-like containment facility -- led directly to his appointment. Continues as General Partner at Plural VC.

Geoffrey Irving, Chief Scientist. PhD CS. Led alignment teams at both OpenAI (Reflection Team, co-authored original RLHF paper) and DeepMind (Scalable Alignment). Joined April 2024. Left DeepMind because he worried "progress on safety is too slow relative to the rate of advancement in AI" and found it "a personal relief to be doing work I'm confident is good overall" -- implying his lab work may have been net negative. His alignment research priorities include debate, formal methods, and understanding long-horizon agent dynamics.

Jade Leung, CTO and PM's AI Adviser. Rhodes Scholar, DPhil Oxford AI governance. Co-founded GovAI with Allan Dafoe. Former OpenAI governance lead. Took a major pay cut to join AISI. Married to Markus Anderljung (GovAI Director of Policy and Research). Only 31.

Team: ~100 technical staff, ~250 total. Leadership transition in October 2025: founding director Oliver Ilott promoted to DG for AI at DSIT, replaced by Adam Beaumont (ex-GCHQ Chief AI Officer) -- a shift from policy to national security background that mirrors the safety-to-security rebrand.

Money and Incentives

Funding is 100% UK government (HM Treasury via DSIT). No lab money, no philanthropic strings, no equity holders. This is structurally unique among major AI safety organizations.

Initial investment: GBP 100M (2023, Frontier AI Taskforce)
Annual budget: ~GBP 50M ($65M/year), approximately 10x the US AISI/CAISI budget
Spending Review 2025: GBP 240M for 2026-2030 (~GBP 60M/year average)
Grant programs: ~GBP 40M disbursed (Alignment Project + Systemic Safety + Challenge Fund)
Alignment Project receives in-kind lab contributions: GBP 5M compute from AWS, GBP 5.6M from OpenAI, plus Anthropic and Microsoft as partners

Budget breakdown is not publicly available. How GBP 50M splits between staff (~250 people), compute (likely via Isambard-AI supercomputer plus AWS credits), grants (~GBP 40M), and operations is undisclosed.

Incentive dynamics. Pure government funding eliminates financial conflicts of interest but creates total dependence on political will. The UK government wants AI to "unleash growth" (Peter Kyle's explicit framing at the Munich Security Conference where the rebrand was announced). TIME reported the key tension: "it was far more important to keep the labs friendly and collaborative, officials believed, than to antagonize them and risk torpedoing the access to models upon which the AISI relied to do its job."

Hogarth's conflict mitigations. Divested Anthropic, Conjecture, Lakera, Faculty, and Stability AI holdings at June 2023 prices (giving up significant paper gains from subsequent appreciation). Recuses from procurement decisions involving portfolio companies. Spouse is CEO of Supercritical (carbon removal). But Plural VC continues investing in AI-adjacent companies.

What Others Say

"Beyond Benchmarks" (arXiv paper): "The inability to generalize from evaluation data to real-world applications undermines the very foundation of ex-ante AI regulation." Argues that deep learning lacks the causal world models that make scientific regulation possible in other domains. Unlike a crash test, where physics lets you extrapolate from one speed to another, there is no theory linking AI benchmark performance to real-world behavior.

Gabriel Alfour (ControlAI): "The Theory of Change behind Evals is broken... Evals only make sense in the presence of regulations which do not exist, and they crowd out effort at passing such regulations... Evals move the burden of proof away from AI Corporations."

Ada Lovelace Institute: "A testing regime is only meaningful with pre-market approval powers underpinned by statute." Identified three failures: technical limitations of benchmarking, bait-and-switch between evaluated and deployed models, and voluntary framework fraying (3 of 4 labs non-compliant with pre-deployment access commitments).

BRAID UK: AISI pursues a "narrowly technical approach" that ignores social, humanistic, and ethical dimensions. Historical parallel: crash test dummies designed for male bodies failed women because safety was treated as purely technical.

TIME Magazine: "Can the fledgling AI Safety Institute really hold billion-dollar tech giants accountable?" and "Without any legal ability to compel labs to act, the AISI could be seen -- from one angle -- as a taxpayer-funded helper to several multibillion-dollar companies."

Irving himself (from inside the organization): "The science of evaluations is not strong enough that we can confidently rule out all risks from doing these evaluations... some of those experiments, at least with the current level of access, can only be conducted at the labs." When the chief scientist acknowledges the limits this frankly, it is more revealing than any external critique.

What's Absent

No enforcement actions. AISI has never publicly recommended that a lab delay or modify a deployment. All interactions remain behind closed doors. If AISI has influenced deployment decisions, the public cannot verify it.
No statutory authority despite two years of operation and a change of government. Labour promised "binding regulation on the handful of companies developing the most powerful AI models" but has not delivered an AI Bill.
Non-compliant labs unnamed. "Three of four" major labs failed to provide pre-deployment access, but which companies, which models, and what circumstances are not disclosed.
No staff morale data post-rebrand. The February 2025 rebrand from "AI Safety Institute" to "AI Security Institute" was widely criticized, but no departures in protest are known. Vocabulary analysis shows systematic removal of terms like "bias," "discrimination," and "accountability" from AISI materials.
No open-source model governance strategy. The Frontier Trends Report shows the capability gap between proprietary and open-source models has narrowed to 4-8 months. Irving acknowledged this as "certainly a concern" but no policy exists.
No independent oversight of AISI itself. No external audit of evaluation methodology, no parliamentary committee with ongoing oversight, no review of whether the voluntary approach is working.
No FOIA transparency. AISI is reportedly not subject to the same FOIA obligations as other government bodies. Evaluation results shared with labs are not available to the public.

Stated Theory of Change

AISI claims two paths to impact: (1) information provision -- build the world's best understanding of frontier AI risk and communicate it to governments so they can make informed decisions, and (2) direct mitigation -- find vulnerabilities in AI systems and work with labs to fix them, while funding independent safety research.

The implicit causal chain: AISI evaluates models -> identifies risks -> informs government -> government acts (through regulation, diplomacy, or voluntary coordination) -> AI development becomes safer. Simultaneously: AISI red-teams models -> shares vulnerabilities with labs -> labs patch them -> deployed models are marginally safer.

Irving articulated a third, deeper layer: AISI exists to "enable coordination towards safer actions" because "market forces mean that there is constant pressure to prioritize speed over caution" and "race dynamics can exacerbate risks but are not inevitable." This frames AISI as a coordination mechanism that reduces the prisoner's dilemma between competing labs.

A fourth, less visible layer emerged from Irving's interview: AISI funds theoretical research (complexity theory, information theory, game theory, formal methods) that might produce "stronger guarantees" than empirical approaches. Irving believes current safety techniques will not produce enough "nines" of reliability, and that a fundamentally different class of theoretical insight is needed. The Alignment Project's GBP 27M is a deliberate bet that these fields "have a bunch of models" that can be adapted to alignment. This is a longer-term theory of change that may take a decade or more to pay off.

Revealed Theory of Change

AISI's actions suggest a somewhat different theory than its stated mission:

What they actually optimize for:

Maintaining voluntary access to frontier models (this constrains everything else)
Producing world-class research that earns respect from the technical community
Building international institutional infrastructure (network of AISIs, joint evaluations, International Safety Report)
Growing the external alignment research ecosystem through grant-making (GBP 40M)
Informing UK policymakers without triggering political backlash

Where actions diverge from stated theory:

The stated theory requires labs to cooperate, but 3 of 4 failed to provide pre-deployment access. AISI did not publicly name them or escalate the non-compliance.
The stated theory assumes evaluation results drive government action, but no legislation has followed two years of increasingly alarming findings (cyber capabilities doubling every 8 months, universal jailbreaks on every model).
The rebrand from "safety" to "security" narrowed scope precisely as AISI's own research showed risks were broadening.
The simultaneous Anthropic MOU announcement with the rebrand suggests AISI is being used as a bridge for tech company partnerships with the UK government, not just as a safety body.
Irving is surprisingly candid that the labs get value from AISI's work: "they want usually to have us give them correct information." This means AISI functions partly as a free, government-funded red team for private companies.

The revealed theory of change is closer to: AISI builds technical credibility and international legitimacy now, so that when a crisis or political window opens, the institutional infrastructure exists to act. This is a bet on future leverage rather than current enforcement.

Key Assumptions

1. Voluntary cooperation provides sufficient model access.

Evidence against: 3 of 4 labs failed to honor pre-deployment commitments. Access is typically API-level, not weights. Labs can observe evaluations on their servers. Irving acknowledged the UK "dropped its request for model weights after labs pushed back."
Evidence for: Irving says the voluntary regime is "working decently well in the sense that developers have made commitments and they're continuing to follow many of those." Red team has tested 30+ models. Irving noted open-weight model research can supplement understanding of where white-box access matters.
Testable: Track compliance rates over time.
If wrong: AISI's evaluation work becomes progressively less meaningful as it evaluates only what labs choose to share, when they choose to share it. Eval awareness (which Irving confirms is "increasing fairly rapidly" and getting worse with newer models) makes API-only evaluation increasingly unreliable.

2. Benchmarks can meaningfully predict real-world model behavior.

Evidence against: "Beyond Benchmarks" paper argues this is infeasible for deep learning. Ada Lovelace identifies the bait-and-switch problem. Irving himself: "The science of evaluations is not strong enough that we can confidently rule out all risks."
Evidence for: AISI's evaluations have found genuine vulnerabilities that labs have fixed -- including pre-deployment fixes to live models, not just future versions. Capability trend data (cyber doubling times) provides useful macro-level information even if individual evals don't certify safety. Irving's team also does human-in-the-loop evaluations (including literal wet lab experiments for bio) that go beyond benchmarks.
Testable: Track whether AISI-evaluated models cause fewer real-world incidents than un-evaluated ones.
If wrong: AISI becomes an expensive provider of false assurance. This is the most dangerous failure mode.

3. Findings will translate into action.

Evidence against: Two years of increasingly alarming data have not produced legislation, enforcement mechanisms, or even public naming of non-compliant labs.
Evidence for: AISI has influenced at least some lab behavior (jailbreak patching, and Irving claims iterative fixes to deployed models not just future ones). The Alignment Project funds real research. International coordination has expanded. Irving noted that government briefings are "changing on the margin" but officials "have other priorities."
Testable: Track policy actions following AISI reports.
If wrong: AISI becomes an extremely well-informed observer of a catastrophe it lacks the power to prevent.

4. Political will persists.

Evidence against: The safety-to-security rebrand, alignment with Trump-era US anti-safety rhetoric, refusal to sign Paris Statement, UK and US attempting to rename the International Network of AISIs (rejected by other members). The Labour government's messaging prioritizes "unleashing AI" over managing risks.
Evidence for: GBP 240M committed for 2026-2030. Alignment Project expanding. Irving and Leung remain in post. Irving says "the UK government does care a lot about these risks."
Testable: Budget allocations, leadership stability, scope of research agenda.
If wrong: AISI is hollowed out or redirected toward serving industry interests rather than public safety.

Strengths

Talent density. AISI has assembled what is likely the most technically sophisticated AI safety team in any government. Irving, Leung, Summerfield, and Gal have published at the absolute frontier. The red team's jailbreaking record (100% success rate against all models) demonstrates genuine capability. Irving's 2-hour Cognitive Revolution interview is the most technically substantive public statement on AI risk from any government official anywhere.

Pure government funding. No lab money, no philanthropic strings, no equity holders. This is structurally unique among major AI safety organizations and eliminates the financial conflict of interest that compromises most others.

Open-source infrastructure. Inspect has become the standard evaluation framework. By creating tools others adopt, AISI multiplies its impact far beyond its own testing. This is AISI's highest-leverage output -- shaping how the entire world measures AI risk.

Grant-making at scale. GBP 40M in alignment/safety grants, drawing 800+ applications from 42 countries, makes AISI the largest government funder of alignment research. Irving is deliberately targeting theorists from complexity theory, game theory, and cognitive science who would not otherwise work on alignment. This diversifies the alignment research ecosystem.

Frontier intelligence. The Frontier AI Trends Report provides genuinely alarming, data-driven public intelligence that no private entity would publish. This is a pure public good.

First-mover institutional advantage. AISI created the template for national AI safety institutes now adopted by 7+ countries. The international network and joint evaluation protocols create lock-in effects that survive political changes.

Research quality. Publication in Science (persuasion study), 10 NeurIPS papers, and active engagement on LessWrong/Alignment Forum demonstrate that AISI researchers are producing genuinely novel work, not just running standardized compliance tests.

Weaknesses and Risks

No enforcement power. This is the fundamental weakness. AISI can identify problems but cannot compel action. A pharmaceutical safety body without the power to block drugs would not be considered meaningful regulation. The voluntary framework has already demonstrably failed (3 of 4 labs non-compliant). Ada Lovelace: "A testing regime is only meaningful with pre-market approval powers underpinned by statute."

Political vulnerability. The rebrand demonstrates that a single political decision can narrow AISI's scope. Being inside DSIT rather than an independent statutory body means AISI can be redirected overnight without legislation. The US AISI faced exactly this threat under Trump -- the UK AISI could face the same.

Evaluation theatre risk. If benchmarks fundamentally cannot predict real-world model behavior (the "Beyond Benchmarks" argument), AISI's evaluations may provide false assurance rather than genuine safety. Labs may use "AISI-tested" as a stamp of legitimacy for deployment decisions. Alfour's argument is specific: "Evals move the burden of proof away from AI Corporations" -- making labs appear responsible without constraining them.

Revolving door. Every senior technical leader came from a lab they now evaluate. This creates both the competence that makes AISI functional and the cultural capture that may limit its adversarial stance. The TIME profile was explicit: keeping labs "friendly and collaborative" takes priority over confrontation because AISI depends on voluntary access.

Scope narrowing under political pressure. The vocabulary shift (dropping "bias," "discrimination," "accountability") and security focus may leave AISI unable to address the societal-scale harms that actually affect most people. The BRAID critique -- that safety is a social problem, not a technical one -- remains valid. South Korea's AISI Director pushed back: "We are doing science -- and safety risks don't suddenly disappear."

Informing without acting. Two years of data showing universal jailbreaks, rapid capability growth, and lab non-compliance have not produced visible policy change. There is a risk that AISI becomes a very expensive early warning system connected to nothing.

Eval awareness is accelerating. Irving confirmed: "The newer models are more eval aware than the previous models. So that's increasing fairly rapidly. And the degree to which you can mitigate that is unclear at this stage." This means AISI's core evaluation methodology faces a moving target that may outpace its countermeasures.

Cross-References

METR: Independent eval org doing similar frontier model testing outside government. Complementary rather than competing -- METR can be more adversarial because it doesn't depend on lab goodwill for model access in the same way. No direct comparison between their methodologies exists publicly.
Anthropic / OpenAI / DeepMind: The labs AISI evaluates. The power asymmetry is extreme -- labs control access, AISI provides free bug-finding services. The Alignment Project creates a healthier dynamic where AISI funds research that benefits the field. Irving's personal connection to these labs is deep (former staff at both OpenAI and DeepMind; he called Paul Christiano his "favorite collaborator").
GovAI (Centre for the Governance of AI): Intellectual progenitor. Leung co-founded GovAI, Dafoe moved to DeepMind. The GovAI -> AISI pipeline shows how academic governance research becomes government policy.
Ada Lovelace Institute: AISI's most constructive institutional critic. Their proposed statutory framework (mandatory pre-deployment testing, FOIA transparency, enforcement powers) is essentially the blueprint for what AISI would need to become effective.
ControlAI: Represents the "evals are harmful" position -- that AISI's approach crowds out the binding regulation that would actually constrain labs.
ARC (Alignment Research Center): Christiano (founder) is on AISI advisory board. ARC's theoretical research agenda (heuristic estimation, opposite arguments in debate) directly connects to AISI's alignment funding priorities. Irving cited ARC's work on formalizing heuristic reasoning as a key area.
US CAISI/NIST: AISI's main international partner. Both renamed from "safety" to "security" in 2025. The US body has been more severely defunded and deprioritized under the Trump administration. Christiano moved from ARC advisory board to US AISI.
Redwood Research: Shlegeris on Alignment Project advisory board. AI control research (which AISI's ControlArena supports) is a key area of overlap.

What Would Change This Assessment

Upward:

UK enacts an AI Bill granting AISI statutory authority to compel model access and block unsafe deployments. This would transform AISI from advisory to regulatory.
AISI publicly names a lab that failed to provide model access and triggers consequences. This would demonstrate independence from industry.
An AISI evaluation discovers a previously unknown dangerous capability that leads to a deployment decision being reversed. This would validate the theory of change.
The Alignment Project produces a theoretical breakthrough (especially in the debate, complexity theory, or learning theory domains Irving prioritizes) that fundamentally advances safety research.
Evidence emerges that AISI evaluations have privately blocked or delayed specific deployments. If this has happened behind closed doors, AISI's practical impact is significantly higher than the public record suggests.

Downward:

AISI evaluates a model, finds it acceptable, and that model subsequently causes significant real-world harm. This would demonstrate evaluation theatre.
Further budget cuts, scope narrowing, or leadership departures (Irving or Leung) that hollow out technical capability.
Labs further withdraw from voluntary cooperation, reducing AISI to evaluating only what labs want evaluated.
The international AISI network fragments as the US/UK "security-only" position drives away other countries.
Open-source models reach frontier capability levels without any governance framework, rendering AISI's lab-partnership model obsolete.

Self-Critique

Weakest claim: My assessment that the rebrand represents a genuine narrowing rather than cosmetic rebranding. AISI insiders maintain the work hasn't changed, and Irving's March 2026 podcast covers the full spectrum of risks including loss of control with no evident narrowing. The vocabulary shift in public materials may not reflect internal research priorities. However, the Opinio Juris documentation of systematic vocabulary changes is hard to dismiss as purely cosmetic.

Potential bias: I may be too sympathetic to the "Beyond Benchmarks" critique. Evaluations may provide more useful information than the paper's theoretical argument suggests, even if they can't provide formal guarantees. The practical value of finding and patching jailbreaks is real even if it doesn't constitute "regulation" in the traditional sense. Irving's point about human-in-the-loop evaluations (wet lab experiments, conversational domain assessments) shows AISI goes beyond simple benchmarking in ways the critique doesn't fully address.

What I should have checked but could not: Hansard parliamentary debates about AISI (blocked by 403 errors). OpenAI's perspective on AISI collaboration (website blocked). Internal staff views post-rebrand. Detailed budget breakdown. Specific deployment decisions influenced by AISI evaluations. Comparison with METR's methodology and results.

What a thoughtful disagreer would say: "AISI is doing the best it can within the constraints of democratic government. Building technical credibility now is the right strategy because enforcement power requires political consensus that doesn't yet exist. The rebrand is pragmatic adaptation to political reality, not surrender. And the alternative -- no government AI safety body at all -- is clearly worse. You are holding AISI to the standard of a regulator when it was never designed to be one."

This is a fair point. The question is whether "doing the best it can" within a fundamentally constrained structure is good enough given the pace of AI development. Irving himself seems uncertain about this -- his optimism about solvability ("in 50 years, 100 years, 1000 years someone will have solved alignment") is paired with deep uncertainty about timing ("hopefully it will have been us kind of in time").

Single weakest claim: That AISI's grant-making (GBP 40M) will produce meaningfully better alignment research than would have happened anyway. The counterfactual is hard to assess. However, Irving's explicit goal of attracting theorists from complexity theory and game theory -- people who would not normally work on alignment -- suggests some genuine additionality. Whether this pans out is a bet.

Information that would most change my view: Evidence that AISI evaluations have actually blocked or delayed a specific deployment. If this has happened behind closed doors, the entire assessment of AISI's practical impact would need to be revised upward significantly.

Connected to (15)

Anthropicevaluates ARIA (Advanced Research and Invention Agency)collaborator · Matt Clifford Google DeepMindevaluates Redwood Researchadvisor at · Buck Shlegeris Ada Lovelace Institutecollaborator Google DeepMindstaff from · Geoffrey Irving OpenAIevaluates Alignment Research Centeradvisor at · Paul Christiano Centre for the Governance of AIstaff from · Jade Leung OpenAIstaff from · Geoffrey Irving US AI Safety Institute (CAISI/NIST)collaborator METRcollaborator

Alan Turing Institutecollaborator

Canadian AI Safety Institute (CAISI)collaborator

GCHQstaff from · Adam Beaumont

Sources (92)

Every URL that was read during research.

1.About | The AI Security Institute (AISI)aisi.gov.uk
2.AISI Research Agenda | The AI Security Instituteaisi.gov.uk
3.Our First Year | AISI Workaisi.gov.uk
4.Our 2025 year in review | AISI Workaisi.gov.uk
5.Grants | The AI Security Institute (AISI)aisi.gov.uk
6.Jade Leung | AISI Teamaisi.gov.uk
7.Artificial intelligence safety institute - Wikipediaen.wikipedia.org
8.Inside the U.K.’s Bold Experiment in AI Safetytime.com
9.UK’s AI Safety Institute Rebrands Amid Government Strategy Shiftinfosecurity-magazine.com
10.AI Now Statement on the UK AI Safety Institute transition to the UK AI Security Institute - AI Now Instituteainowinstitute.org
11.Introducing the AI Safety Institutegov.uk
12.Early lessons from evaluating frontier AI systems | AISI Workaisi.gov.uk
13.Fourth progress report | AISI Workaisi.gov.uk
14.5 key findings from our first Frontier AI Trends Report | AISI Workaisi.gov.uk
15.AI Safety Institute approach to evaluationsgov.uk
16.Ian Hogarth - Wikipediaen.wikipedia.org
17.Ian Hogarth's declared outside interestsgov.uk
18.Inside the UK's AI Security Institute - Raconteurraconteur.net
19.Safety first?adalovelaceinstitute.org
20.Understanding the First Wave of AI Safety Institutes: Characteristics, Functions, and Challenges — Institute for AI Policy and Strategyiaps.ai
21.Deepening AI Safety Research with UK AI Security Institute (AISI)deepmind.google
22.The Alignment Project by AISI — The AI Security Institutealignmentproject.aisi.gov.uk
23.Report: AI giants grow impatient with UK safety testscio.com
24.Britain expands AI Safety Institute to San Francisco amid scrutiny over regulatory shortcomingscnbc.com
25.Jade Leung (engineer) - Wikipediaen.wikipedia.org
26.Why I joined AISI by Geoffrey Irving | AISI Workaisi.gov.uk
27.Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irvingcognitiverevolution.ai
28.UK AI Safety Institute rebranded AI Security Institutetheregister.com
29.Tackling AI security risks to unleash growth and deliver Plan for Changegov.uk
30.Google DeepMind agrees to sweeping research collaboration with the U.K. government | Fortunefortune.com
31.Memorandum of Understanding between the UK and Google DeepMind on AI opportunities and securitygov.uk
32.The Global Landscape of AI Safety Institutes — All Tech Is Humanalltechishuman.org
33.The AI Safety Institute International Network: Next Steps and Recommendationscsis.org
34.UKAISI at NeurIPS 2025 | AISI Workaisi.gov.uk
35.AISI Research & Publications | The AI Security Instituteaisi.gov.uk
36.Advancing the field of systemic AI safety: grants open | AISI Workaisi.gov.uk
37.Announcing Inspect Evals | AISI Workaisi.gov.uk
38.Remarks made by Technology Secretary Peter Kyle at the Munich Security Conferencegov.uk
39.The UK’s Contribution to Developing the International Governance of AI Safety and How its Policy Took a Sharp Turnopiniojuris.org
40.AI Safety Institute rebrand is a ‘downgrade’ of ethics standards, Full Fact warns - UKTNuktech.news
41.UK’s AI Safety Institute drops ‘safety’computing.co.uk
42.A short biography of the Prime Minister's new AI adviseruk20.substack.com
43.What We Know About the New U.K. Government’s Approach to AItime.com
44.Government renames AI Safety Institute and teams up with Anthropic | Computer Weeklycomputerweekly.com
45.UK drops 'safety' from its AI body, now called AI Security Institute, inks MOU with Anthropic | TechCrunchtechcrunch.com
46.The U.K. isn't so worried about AI safety anymore | Fortunefortune.com
47.Pre-deployment evaluation of Anthropic’s upgraded Claude 3.5 Sonnet | AISI Workaisi.gov.uk
48.Strengthening our safeguards through collaboration with US CAISI and UK AISIanthropic.com
49.International AI Safety Report - Wikipediaen.wikipedia.org
50.International AI Safety Reportinternationalaisafetyreport.org
51.Prime Minister launches new AI Safety Institutegov.uk
52.The UK AI Safety Summit Opened a New Chapter in AI Diplomacycarnegieendowment.org
53.Careers | The AI Security Institute (AISI)aisi.gov.uk
54.Apply for an Open Role | The AI Security Institute (AISI)aisi.gov.uk
55.Finding a Niche: UK Takes on AI Existential Riskcepa.org
56.UK opens office in San Francisco to tackle AI risk | TechCrunchtechcrunch.com
57.Government's trailblazing Institute for AI Safety to open doors in San Franciscogov.uk
58.Beyond Benchmarks: On The False Promise of AI Regulationarxiv.org
59.Introducing ControlArena: A library for running AI control experiments | AISI Workaisi.gov.uk
60.Exclusive: UK AISI hires ex-GCHQ AI chief as interim directortransformernews.ai
61.Cutting-Edge Research on AI Security bolstered with new Challenge Fund to ramp up public trust and adoptiongov.uk
62.Frontier AI Trends Report by The AI Security Institute (AISI)aisi.gov.uk
63.OII | Oxford and AISI researchers reveal how conversational AI can change political opinionsoii.ox.ac.uk
64.AISI Blog | The AI Security Instituteaisi.gov.uk
65.Making safeguard evaluations actionable | AISI Workaisi.gov.uk
66.Organisation | AISI Work Categoryaisi.gov.uk
67.AISI Frontier AI Trends Report (2025)aisi.gov.uk
68.Investigating models for misalignment | AISI Workaisi.gov.uk
69.RepliBench: measuring autonomous replication capabilities in AI systems | AISI Workaisi.gov.uk
70.TiB 149: Who will own the AI future; building a more innovative society; what top ML talent wants; and more...tib.matthewclifford.com
71.UK pivots from preventive to defensive stance with rebranded AI institutegovinsider.asia
72.Will the UK AI Bill protect people and society?adalovelaceinstitute.org
73.Pre-Deployment evaluation of OpenAI’s o1 model | AISI Workaisi.gov.uk
74.Safety cases at AISI | AISI Workaisi.gov.uk
75.How can safety cases be used to help with frontier AI safety? | AISI Workaisi.gov.uk
76.Examining backdoor data poisoning at scale | AISI Workaisi.gov.uk
77.The AI safety institute network: who, what and how? - Centre for Future Generationscfg.eu
78.Navigating the AI safety landscape: a comprehensive overview of the AI Safety Institute (AISI)techuk.org
79.Advanced AI evaluations at AISI: May update | AISI Workaisi.gov.uk
80.OII | Dr Keegan McBride: Why the UK AI Safety Summit will fail to be meaningfuloii.ox.ac.uk
81.TIME100 AI 2023: Ian Hogarthtime.com
82.Jade Leungtime.com
83.AI Security Institute – Frontier AI Trends report factsheetgov.uk
84.UK government unveils AI safety research funding details | Computer Weeklycomputerweekly.com
85.AI Safety Institutegov.uk
86.Inaugural report pioneered by AI Security Institute gives clearest picture yet of capabilities of most advanced AIgov.uk
87.AI Security Institute launches international coalition to safeguard AI developmentgov.uk
88.We Must Slow Down the Race to God-Like AI - Longreadslongreads.com
89.FT. We must slow down the race to God-like AI. I’ve invested in more than 50 artificial intelligence start-ups. What I’ve seen worries me. Ian Hogarth. APRIL 12 2023.blog.biocomm.ai
90.Inside the UK’s Frontier Artificial Intelligence Taskforce: Conflicts of Interest and the Spectre of Effective Altruismbylinetimes.com
91.A shrinking path to safety: how a narrowly technical approach to align AI with the public good could fail - BRAID UKbraiduk.org
92.Funding 60 projects to advance AI alignment research | AISI Workaisi.gov.uk