Krueger AI Safety Lab (KASL) / Evitable

Research

David Krueger. Goal misgeneralization.

Founded: 2021
HQ: Montreal, QC, Canada
Team: 1
Structure: university-affiliated
Model: Grants

Theory of Change

KASL and Evitable represent two phases of the same person's evolving theory of change.

Phase 1 -- KASL (2021-2025): An academic alignment lab at Cambridge and then Mila/UMontreal. Krueger's approach was unusual: he had no research agenda. Instead he hired talented people who cared about safety and let them drive their own projects. "I've been thinking about this stuff for a long time, more than 10 years now, and it's always seemed like a really hard problem and I don't see any super promising paths towards solving it." His lab produced influential work on goal misgeneralization, reward hacking, and broken neural scaling laws, while Krueger increasingly focused on coordination and governance.

Phase 2 -- Evitable (2025-present): Krueger concluded that technical alignment is insufficient for existential safety and pivoted to public advocacy. Evitable's theory of change is direct: inform the public about AI risks, organize opposition, and push for a global shutdown of advanced AI chip production. "The simplest, most robust way would be to cease production of advanced AI chips... These chips are the 'weapons-grade plutonium' of superintelligence." The intellectual foundation is Krueger's "Gradual Disempowerment" paper (2025), which argues that even incrementally advancing, non-power-seeking AI will displace humans from economic, cultural, and political systems, leading to irreversible human disempowerment.

The connecting thread between phases: even while running a technical lab, Krueger stated publicly that "I'm pretty pessimistic about the technical approaches. I don't think alignment is a problem that can be solved... we're going to need to have some ability to coordinate." The pivot was long in the making.

What They Do

KASL's research output includes several widely-cited papers:

Goal Misgeneralization in Deep RL (ICML 2022, 233 citations) -- demonstrated RL agents pursuing wrong goals in novel environments
Defining and Characterizing Reward Hacking (NeurIPS 2022, 601 citations) -- formal theoretical framework for reward gaming
Broken Neural Scaling Laws (ICLR 2023, 145 citations)
Foundational Challenges in LLM Safety (2024, 322 citations, 35+ co-authors including Bengio)
Interpreting Emergent Planning (ICLR 2025 oral)

High-impact actions beyond research:

Initiated the CAIS Statement on AI Risk (May 2023), signed by hundreds of researchers including Hinton, Bengio, Altman, Hassabis. This one-sentence statement ("Mitigating the risk of extinction from AI should be a global priority...") made global headlines, was acknowledged by the UK Prime Minister and White House.
Research Director on the founding team of the UK AI Security Institute (2023).
Co-authored "Managing Extreme AI Risks Amid Rapid Progress" in Science (2024, 732 citations) with Bengio and Hinton.
In 2026: op-eds in Fortune, Guardian, USA Today; testimony at Canadian Parliament; speech at Stop the AI Race protest in San Francisco marching on Anthropic, OpenAI, and xAI.

Evitable (founded 2025) is hiring for operations, communications, chief of staff, and movement building. It is listed as a supporting group for the Stop the AI Race movement. It does not do research.

KASL is on hiatus for 2026 while Krueger is on leave from his faculty position. He is not accepting new students.

Key People

David Krueger -- Founder/PI. PhD under Yoshua Bengio at Mila (2013-2021). Interned at DeepMind AI Safety (2018). Among the most-cited safety researchers globally (top paper: NICE, 3,338 citations; top safety paper: Open Problems in RLHF, 1,003 citations). Co-authored with Hinton, Bengio, Leike (DeepMind/Anthropic), Critch (CHAI). CIFAR AI Chair, IVADO Professor of Responsible AI, core member at Mila, CHAI, and CSER. Initiated the CAIS extinction risk statement. In 2026, fully pivoted to advocacy via Evitable.

Notable alumni/collaborators: Jesse Hoogland became executive director of Timaeus. Micah Carroll visited from Berkeley (Dragan/Russell group). Ryan Greenblatt (Redwood/Anthropic) co-authored password-locked models work. Joar Skalse (Oxford) co-authored reward hacking theory.

Lab size: ~8 PhD students at peak. People page is now essentially empty (lab on hiatus).

Money and Incentives

KASL funding model was standard academic: university salary (Cambridge, then UMontreal) plus targeted grants. Total known external safety funding:

Open Philanthropy: $153,102 across 3 grants (2022), via BERI; plus a prior 2021 grant of unknown amount
Schmidt Sciences: one grant for test set contamination research (amount undisclosed)
CIFAR AI Chair: ~$200K/yr (standard for this Canadian government appointment)
IVADO Professorship in Responsible AI: amount undisclosed

Evitable funding: ~$40K tentatively committed from private donors (via Manifund). Anticipating but not confirmed: FLI, SFF, LTFF grants. Fiscal sponsor signed but identity undisclosed.

Key financial observations:

KASL was among the most modestly funded safety research groups -- $153K from OP compares to millions for orgs like ARC, Redwood, or Mila-affiliated labs generally. The university infrastructure absorbed most costs.
No AI lab dependencies. No compute provider conflicts. No product revenue. No dual-use pressures. The university-based model is the simplest and most conflict-free funding structure in the safety space.
Evitable's financial sustainability is genuinely uncertain. Hiring for 3+ roles with only ~$40K committed.
Krueger is on leave from his faculty position for 2026. Whether this is paid or unpaid leave is unknown. If unpaid, he is absorbing significant personal cost.
Evitable seeks longtermist funding (FLI, SFF, LTFF) despite populist public messaging. This creates a potential tension.

What Others Say

On the Gradual Disempowerment paper:

Zvi Mowshowitz (prominent AI risk commentator) endorsed strongly: "I am in violent agreement with this paper, perhaps what one might call violent super-agreement." But added that all proposed mitigations are insufficient and the scenario is the "baseline" for worlds that survive more acute threats.

Beren Millidge offered the sharpest critique: "the gradual disempowerment scenario is essentially the lived experience of almost all individual humans anyway." The median human already has no control over the economy, culture, or technology. Argues AI could improve quality of life even without human control, especially if humans retain a tiny fraction of cosmic resources.

An anonymous reviewer ("Thoughts on GD") argued that "the most convincing versions of gradual disempowerment either rely on misalignment or result in power concentration among humans, not total disempowerment." Questions whether GD is a distinct risk or just another lens on existing concerns.

80,000 Hours created a dedicated problem profile for Gradual Disempowerment, treating it as a distinct career-worthy area.

On the advocacy pivot:

No public criticism found. This silence is notable: if alignment talent is scarce, losing a productive PI should be costly. The absence of objection may indicate more private agreement with Krueger's pessimism than is publicly expressed.

On the CAIS statement (2023):

Critics included Timnit Gebru (hype from "the same people who have poured billions into these companies"), Emile Torres (motivated by "TESCREAL" ideologies), and Human Rights Watch (should focus on known risks rather than speculative future risks). Defenders argued the statement created common knowledge and was a necessary first step.

On the protest movement:

The Stop the AI Race protest coalition includes groups of varying reputation. Stop AI (distinct from Stop the AI Race) has been linked to violent threats and arrests. PauseAI maintains nonviolent principles. Krueger's Evitable aligns with the nonviolent wing but operates in a coalition where these groups overlap.

What's Absent

No concrete theory of victory for Evitable's chip ban proposal. The endpoint is stated but no political roadmap from "public outrage" to "international chip production ban."
No engagement with the political economy of chip bans (TSMC/Taiwan, ASML/Netherlands, massive economic and national security implications).
No disclosed governance structure for Evitable (board, advisors, fiscal sponsor identity).
No response to substantive GD criticisms from Beren, the "Thoughts on GD" author, or Zvi.
No information about KASL students' situations during Krueger's leave. Several were mid-PhD.
No measurable success criteria for Evitable.
No evidence of Evitable policy impact yet (vs. ControlAI's 200+ lawmaker briefings).

Stated Theory of Change

KASL/Evitable is really two theories of change in sequence, reflecting one person's evolving views:

KASL (2021-2025): Run an academic alignment lab that produces rigorous empirical work on failure modes (goal misgeneralization, reward hacking, scaling behavior). Train PhD students who care about safety. Build bridges between the ML mainstream and the safety community. Krueger's explicit framing: "Let's get talented people. Let's get people who understand and care about the problem. Let's put them all together and just see what happens." The theory was that empirical safety research, done rigorously, would (a) identify genuine risks and (b) convince mainstream ML researchers to take safety seriously.

Evitable (2025-present): Inform and organize the public to oppose advanced AI development. Push for a global ban on advanced AI chip production, modeled on nuclear nonproliferation. The causal chain: public understands risks -> public demands action -> governments restrict chip production -> advanced AI development halts. Krueger's premise: technical alignment is unsolvable to the existential safety bar, therefore the only remaining option is stopping development entirely.

Revealed Theory of Change

Krueger's actions reveal a theory of change that evolved gradually over a decade, not a sudden pivot:

2013-2021 (Mila PhD): Tried to convince ML researchers about existential risk through conversation and reading groups. Found this difficult but made progress.
2021-2023 (Cambridge): Ran a safety lab, but the most impactful action was the CAIS Statement (May 2023) -- a coordination/communication act, not a technical contribution. Also joined the UK AISI founding team.
2023-2024: Published the Science paper on managing AI risks with Bengio/Hinton. Moved to Montreal.
2025: Published Gradual Disempowerment paper. Founded Evitable. Launched Substack. Started doing media.
2026: On leave from faculty. Full-time advocacy. Op-eds, Parliament testimony, protest speeches.

The revealed pattern: at each stage, Krueger's most impactful actions were coordination and communication, not technical research. The CAIS statement was arguably more impactful than all of KASL's papers combined. His "revealed theory of change" was always that the binding constraint is coordination, not technical insight -- he just needed time to fully embrace that conclusion.

Tension between stated and revealed: The lab was explicitly a "safety lab" but Krueger never had a research agenda. He was more a facilitator than a technical leader. His most important intellectual contribution (Gradual Disempowerment) is a social science / political economy argument, not a technical one. The lab's technical contributions, while individually strong, don't form a coherent research program aimed at solving a specific safety problem.

Key Assumptions

1. Technical alignment is unsolvable to the existential safety bar.

Evidence for: After a decade of research and thousands of papers, the field lacks a compelling solution. Current approaches (RLHF, constitutional AI, etc.) may work for current systems but offer no guarantees for superintelligent ones. This is Krueger's core belief.
Evidence against: The field is young and rapidly growing. Scaling laws for alignment may exist but haven't been discovered. Many researchers (Christiano, Leike, Russell) believe the problem is tractable if given enough time and resources. Krueger's own students produced results suggesting partial solutions exist.
Testable? Partially. If alignment techniques continue to fail at more capable systems, this updates toward Krueger's view. If scaled systems prove more amenable to alignment than expected, it updates away.
If wrong: The advocacy pivot represents a massive misallocation of top-tier technical talent. Krueger could have trained many more safety researchers through his academic position.

2. Public advocacy can halt advanced AI development.

Evidence for: Nuclear nonproliferation exists. Polls show 51% support for temporary AI pauses, 70% want regulation. Data center opposition is growing locally. The Stop the AI Race movement has CEO quotes suggesting private openness to conditional pauses.
Evidence against: The AI industry has trillions in investment. Competitive dynamics between nations (US-China) make unilateral action ineffective. No democratic process has successfully halted a multi-trillion-dollar industry. The chip supply chain is concentrated but nations have strong incentives to maintain it.
Testable? Observe whether the advocacy movement achieves any concrete policy restrictions within 2-3 years.
If wrong: Evitable consumes resources and career years producing media coverage but no policy change, while technical work that might have helped goes undone.

3. Gradual disempowerment is a distinct and important risk beyond standard misalignment concerns.

Evidence for: The paper is well-argued and identifies genuine mechanisms (economic displacement, cultural drift, state disalignment from citizens). 80K Hours created a problem profile for it. Zvi endorses it strongly.
Evidence against: The most compelling critique argues GD either reduces to misalignment (in which case it's not distinct) or to power concentration (in which case it's not total disempowerment). The "Thoughts on GD" author and Beren both raise serious challenges.
Testable? Partially. We can observe whether economic AI adoption produces the feedback loops the paper predicts.
If wrong: The intellectual foundation for Evitable weakens, though the broader case for AI risk remains.

4. The chip supply chain is a viable leverage point for halting AI development.

Evidence for: TSMC and ASML are extraordinarily concentrated chokepoints. Existing export controls on chips to China demonstrate precedent.
Evidence against: Chip export controls have proven leaky. Algorithmic progress (e.g., DeepSeek) reduces compute requirements. A total production ban would devastate non-AI industries that depend on chips. The political economy is vastly more complex than nuclear nonproliferation.
If wrong: Evitable's core policy proposal is infeasible, leaving the organization without a concrete ask.

Strengths

Intellectual credibility. Krueger is among the most-cited researchers in AI safety, with co-authorships alongside Hinton, Bengio, and Leike. His PhD lineage (Bengio at Mila) and positions (Cambridge, CIFAR Chair) give him authority that most advocacy leaders lack. When he says "researchers simply don't understand how the resulting systems work," it carries weight because he was one of those researchers.

Track record of high-impact actions. The CAIS statement was arguably the single highest-impact action in AI safety advocacy. Getting Hinton, Bengio, Altman, Hassabis, and hundreds of researchers to sign a one-sentence extinction risk statement required both credibility and strategic skill. This suggests Krueger may be unusually effective at coordination actions.

The GD paper fills a genuine intellectual gap. Most AI risk analysis focuses on either misaligned superintelligence or near-term harms. GD identifies a scenario where even aligned AI leads to catastrophe through economic/political/cultural displacement. This has generated serious engagement from across the safety community and has been adopted by 80K Hours as a distinct problem area.

Clean incentive structure. No lab funding, no compute dependencies, no dual-use tensions, no product revenue. Krueger's academic position is the simplest possible financial structure for a safety researcher. His willingness to go on leave for advocacy suggests genuine conviction rather than career optimization.

Bridge builder. Krueger's network spans academic ML (Mila, Cambridge), safety orgs (CHAI, CSER), governance (UK AISI, CAIP, FLI), frontier labs (DeepMind internship, Leike co-authorship), and now public advocacy. Few people in the field have such broad connections.

Weaknesses and Risks

The advocacy pivot may be premature. If technical alignment is more tractable than Krueger believes, his pivot represents a loss of a productive PI, several PhD students' supervision, and future researchers he would have trained. The opportunity cost is real: Krueger was one of very few safety-focused PIs at a top-tier ML institution.

Evitable has no plausible path to its stated goal. Banning advanced chip production globally would require unprecedented international cooperation, overcoming the resistance of the entire semiconductor and AI industries, and navigating US-China competition. Krueger acknowledges the "barriers are political, not technical" but provides no political strategy beyond "inform the public." ControlAI's approach (systematic lawmaker briefings) is more operationally concrete.

The coalition risk. Krueger's Evitable is now operating alongside groups of varying quality and reputation. The Stop AI group has been linked to violent threats. While Krueger aligns with the nonviolent wing, guilt by association is a real risk in a space where credibility matters enormously.

Rhetoric is escalating beyond what the evidence supports. Krueger's Fortune op-ed declares "We should not expect any amount of investment to solve this in the foreseeable future" (alignment) and calls for "global shutdown of advanced AI development." These are extremely strong claims. The first goes beyond reasonable pessimism into unfalsifiable certainty. The second is a policy proposal that even most safety researchers don't endorse.

Evitable's governance is opaque. For an organization that criticizes AI companies for lack of accountability, Evitable has disclosed no board, no advisors, no fiscal sponsor identity, no budget, and no success metrics. This undermines the message.

PhD students may be harmed. Krueger going on leave from faculty likely disrupted students who were mid-PhD. No public information exists about arrangements for their supervision.

Cross-References

Compared to other academic alignment labs:

MIT Alignment (Dylan Hadfield-Menell): has a clear research agenda (cooperative inverse RL, safety of AI in society), continues active research. KASL had no comparable agenda.
NYU ARG (Sam Bowman): focused on language model evaluation and honesty. More technically focused, less governance-oriented. Bowman has publicly shifted views about risk but continues research.
KASL was unique among academic labs in its PI openly doubting the tractability of the technical program while running a technical lab.

Compared to advocacy organizations:

PauseAI: grassroots, protest-focused, calls for temporary pause. Similar constituency but Evitable frames as permanent halt, not pause.
ControlAI: professionalized lobbying, 200+ lawmaker briefings. Far more concrete political strategy than Evitable.
Stop the AI Race: protest organization (Trazzi). Krueger spoke at their march. More activist, less academic.
Evitable's distinctive value-add is Krueger's personal credibility as a researcher. The question is whether that credibility is being deployed effectively.

Compared to Gradual Disempowerment-adjacent thinkers:

Paul Christiano: "What Failure Looks Like" (2019) described similar dynamics. Christiano continued technical work at Redwood and then ARC.
Jan Kulveit (ACS Research): co-author of GD paper, leads the research group most directly working on these questions.
Carl Shulman: has discussed similar economic displacement scenarios extensively. Advocates for governance solutions without opposing development.

Alumni connections:

Jesse Hoogland (KASL -> Timaeus executive director): carrying forward a connection between KASL's network and developmental interpretability.

What Would Change This Assessment

Evitable achieves a concrete policy outcome (e.g., legislation restricting chip exports for AI training, international agreement framework) within 12-18 months. This would demonstrate the advocacy model works.
Alignment research produces a major breakthrough that Krueger acknowledges changes his pessimism. This would undermine the intellectual basis for the pivot.
Evitable publishes a concrete political strategy with milestones, coalition partners, and success metrics. This would address the "no path to victory" weakness.
A prominent safety researcher publicly criticizes the pivot, articulating the opportunity cost argument. This would indicate the community's private views differ from the public silence.
The advocacy coalition suffers a reputational crisis (e.g., a linked group commits violence). This would test whether Krueger's credibility survives association.

Self-Critique

Weakest claim: My assessment that the advocacy pivot "may be premature" depends heavily on the tractability of technical alignment, which I genuinely don't know. If Krueger is right that alignment is unsolvable, then his pivot is not just justified but possibly heroic. My framing may be biased toward the academic/technical perspective simply because that's the dominant view in the safety community.

Potential bias: I may be underweighting Krueger's personal track record of high-impact coordination actions (CAIS statement, UK AISI) relative to his technical work. If his comparative advantage really is in coordination rather than research, the pivot is more justified than it appears.

Sources I should have checked:

The For Humanity podcast (Feb 2026) -- reportedly contains Krueger's shortest timeline estimate
The Nature article (failed to fetch)
Krueger's LessWrong post on academia vs. industry (referenced in the interview but not fetched)
More detail on KASL students' current situations

What a thoughtful disagreer would say: "Krueger is an accomplished researcher who looked at the evidence honestly and concluded the field is on the wrong track. Rather than continuing to collect a comfortable salary for work he doesn't believe in, he's taking a huge personal risk to do what he thinks is actually needed. The fact that most of the safety community disagrees with him doesn't make him wrong -- it might make him the only one being honest."

What information would most change my view: Evidence that a serious political coalition (not just protests) is forming around chip regulation, with Evitable playing a central coordinating role. This would address my biggest concern -- that the advocacy model has no plausible path to policy impact.

Connected to (15)

ControlAIcollaborator

PauseAIcollaborator

Alignment of Complex Systems Research Groupcollaborator · Jan Kulveit

Center for AI Policyadvisor at · David Krueger

University of Cambridgestaff from · David Krueger

Anthropiccollaborator · Ryan Greenblatt

Center for AI Safetycollaborator · David Krueger

Future of Life Instituteadvisor at · David Krueger

Timaeusstaff to · Jesse Hoogland

UK AI Safety Instituteadvisor at · David Krueger

Berkeley Existential Risk Initiativecollaborator

Centre for the Study of Existential Riskadvisor at · David Krueger

CHAIadvisor at · David Krueger

Milastaff from · David Krueger

Google DeepMindstaff from · David Krueger

Sources (51)

Every URL that was read during research.

1.Welcome to the Krueger Lab.kasl.ai
2.Teamkasl.ai
3.Publicationskasl.ai
4.David Scott Kruegerdavidscottkrueger.com
5.David Krueger On Academic Alignmenttheinsideview.ai
6.Evitableevitable.com
7.AI is not inevitable.therealartificialintelligence.substack.com
8.The real AI deploys itselftherealartificialintelligence.substack.com
9.Announcing "The Real AI": a blogtherealartificialintelligence.substack.com
10.The Real AI | David Krueger | Substacktherealartificialintelligence.substack.com
11.David Krueger - Schmidt Sciencesschmidtsciences.org
12.David Krueger - CIFARcifar.ca
13.David Krueger - CSERcser.ac.uk
14.David Scott Krueger | Milamila.quebec
15.David Krueger - Future of Life Institutefutureoflife.org
16.David Krueger at Center for AI Policy | CAIPcenteraipolicy.org
17.Two Visions of AI Apocalypse (Robert Wright & David Krueger)nonzero.org
18.Foundational Challenges in Assuring Alignment and Safety of Large Language Modelsllm-safety-challenges.github.io
19.Rogue AI is already here | Fortunefortune.com
20.What is David Krueger working on?aisafety.info
21.Professor David Scott Krueger Appointed Canada CIFAR AI Chair | Milamila.quebec
22.Unknowndavidscottkrueger.com
23.Systemic Existential Risks from Incremental AI Developmentgradual-disempowerment.ai
24.Gradual Disempowerment: Systemic Existential Risks from Incremental AI Developmentarxiv.org
25.Managing extreme AI risks amid rapid progressarxiv.org
26.The Risk of Gradual Disempowerment from AIthezvi.substack.com
27.Let's talk about the CAIS statement on extinction risk from AIcoordination.substack.com
28.Statement on AI Risk - Wikipediaen.wikipedia.org
29.My new nonprofit Evitable is hiring.ea.greaterwrong.com
30.David Scott Kruegerscholar.google.ca
31.Is AI’s brave new world unsafe for workers and humanity?peoplesworld.org
32.Open Roles — Evitableevitable.com
33.Interpreting Emergent Planning in Model-Free Reinforcement Learningarxiv.org
34.Defining and Characterizing Reward Hackingarxiv.org
35.Goal Misgeneralization in Deep Reinforcement Learningarxiv.org
36.Rise of the Robot Overlordsleightonwoodhouse.substack.com
37.Antisocial media: AI’s killer app?therealartificialintelligence.substack.com
38.Gradual Disempowerment: Systemic Existential Risks from Continuous AI Development — AI • Objectives • Instituteai.objectives.institute
39.Gradual Disempowerment: Systemic Existential Risks from Incremental AI Developmentarxiv.org
40.Gradual Disempowerment Might Not Be So Badberen.io
41.Gradual Disempowerment, Shell Games and Flinchesgreaterwrong.com
42.Thoughts on Gradual Disempowermentgreaterwrong.com
43.Gradual disempowerment | 80,000 Hours80000hours.org
44.A brief guide to the groups protesting over AItransformernews.ai
45.AI agents could pose a risk to humanity. We must act to prevent that future | David Krueger - United Stateseuropesays.com
46.No, AI isn't inevitable. We should stop it while we can. | Opinionusatoday.com
47.A 'Stop the AI Race' Rally and Protest Planned Saturday Outside Anthropic's and OpenAI's Headquarterssfist.com
48.'Stop the AI Race': Why Protesters Marched on Silicon Valley's AI Giants - WorthvieWworthview.com
49.Press · Stop The AI Racestoptherace.ai
50.Artificial intelligence raises risk of extinction, experts say in new warningapnews.com
51.Andrew Critch and David Krueger publish “AI Research Considerations for Human Existential Safety (ARCHES)”humancompatible.ai