Theory of Change
KASL and Evitable represent two phases of the same person's evolving theory of change.
Phase 1 -- KASL (2021-2025): An academic alignment lab at Cambridge and then Mila/UMontreal. Krueger's approach was unusual: he had no research agenda. Instead he hired talented people who cared about safety and let them drive their own projects. "I've been thinking about this stuff for a long time, more than 10 years now, and it's always seemed like a really hard problem and I don't see any super promising paths towards solving it." His lab produced influential work on goal misgeneralization, reward hacking, and broken neural scaling laws, while Krueger increasingly focused on coordination and governance.
Phase 2 -- Evitable (2025-present): Krueger concluded that technical alignment is insufficient for existential safety and pivoted to public advocacy. Evitable's theory of change is direct: inform the public about AI risks, organize opposition, and push for a global shutdown of advanced AI chip production. "The simplest, most robust way would be to cease production of advanced AI chips... These chips are the 'weapons-grade plutonium' of superintelligence." The intellectual foundation is Krueger's "Gradual Disempowerment" paper (2025), which argues that even incrementally advancing, non-power-seeking AI will displace humans from economic, cultural, and political systems, leading to irreversible human disempowerment.
The connecting thread between phases: even while running a technical lab, Krueger stated publicly that "I'm pretty pessimistic about the technical approaches. I don't think alignment is a problem that can be solved... we're going to need to have some ability to coordinate." The pivot was long in the making.
What They Do
KASL's research output includes several widely-cited papers:
- Goal Misgeneralization in Deep RL (ICML 2022, 233 citations) -- demonstrated RL agents pursuing wrong goals in novel environments
- Defining and Characterizing Reward Hacking (NeurIPS 2022, 601 citations) -- formal theoretical framework for reward gaming
- Broken Neural Scaling Laws (ICLR 2023, 145 citations)
- Foundational Challenges in LLM Safety (2024, 322 citations, 35+ co-authors including Bengio)
- Interpreting Emergent Planning (ICLR 2025 oral)
High-impact actions beyond research:
- Initiated the CAIS Statement on AI Risk (May 2023), signed by hundreds of researchers including Hinton, Bengio, Altman, Hassabis. This one-sentence statement ("Mitigating the risk of extinction from AI should be a global priority...") made global headlines, was acknowledged by the UK Prime Minister and White House.
- Research Director on the founding team of the UK AI Security Institute (2023).
- Co-authored "Managing Extreme AI Risks Amid Rapid Progress" in Science (2024, 732 citations) with Bengio and Hinton.
- In 2026: op-eds in Fortune, Guardian, USA Today; testimony at Canadian Parliament; speech at Stop the AI Race protest in San Francisco marching on Anthropic, OpenAI, and xAI.
Evitable (founded 2025) is hiring for operations, communications, chief of staff, and movement building. It is listed as a supporting group for the Stop the AI Race movement. It does not do research.
KASL is on hiatus for 2026 while Krueger is on leave from his faculty position. He is not accepting new students.
Key People
David Krueger -- Founder/PI. PhD under Yoshua Bengio at Mila (2013-2021). Interned at DeepMind AI Safety (2018). Among the most-cited safety researchers globally (top paper: NICE, 3,338 citations; top safety paper: Open Problems in RLHF, 1,003 citations). Co-authored with Hinton, Bengio, Leike (DeepMind/Anthropic), Critch (CHAI). CIFAR AI Chair, IVADO Professor of Responsible AI, core member at Mila, CHAI, and CSER. Initiated the CAIS extinction risk statement. In 2026, fully pivoted to advocacy via Evitable.
Notable alumni/collaborators: Jesse Hoogland became executive director of Timaeus. Micah Carroll visited from Berkeley (Dragan/Russell group). Ryan Greenblatt (Redwood/Anthropic) co-authored password-locked models work. Joar Skalse (Oxford) co-authored reward hacking theory.
Lab size: ~8 PhD students at peak. People page is now essentially empty (lab on hiatus).
Money and Incentives
KASL funding model was standard academic: university salary (Cambridge, then UMontreal) plus targeted grants. Total known external safety funding:
- Open Philanthropy: $153,102 across 3 grants (2022), via BERI; plus a prior 2021 grant of unknown amount
- Schmidt Sciences: one grant for test set contamination research (amount undisclosed)
- CIFAR AI Chair: ~$200K/yr (standard for this Canadian government appointment)
- IVADO Professorship in Responsible AI: amount undisclosed
Evitable funding: ~$40K tentatively committed from private donors (via Manifund). Anticipating but not confirmed: FLI, SFF, LTFF grants. Fiscal sponsor signed but identity undisclosed.
Key financial observations:
- KASL was among the most modestly funded safety research groups -- $153K from OP compares to millions for orgs like ARC, Redwood, or Mila-affiliated labs generally. The university infrastructure absorbed most costs.
- No AI lab dependencies. No compute provider conflicts. No product revenue. No dual-use pressures. The university-based model is the simplest and most conflict-free funding structure in the safety space.
- Evitable's financial sustainability is genuinely uncertain. Hiring for 3+ roles with only ~$40K committed.
- Krueger is on leave from his faculty position for 2026. Whether this is paid or unpaid leave is unknown. If unpaid, he is absorbing significant personal cost.
- Evitable seeks longtermist funding (FLI, SFF, LTFF) despite populist public messaging. This creates a potential tension.
What Others Say
On the Gradual Disempowerment paper:
Zvi Mowshowitz (prominent AI risk commentator) endorsed strongly: "I am in violent agreement with this paper, perhaps what one might call violent super-agreement." But added that all proposed mitigations are insufficient and the scenario is the "baseline" for worlds that survive more acute threats.
Beren Millidge offered the sharpest critique: "the gradual disempowerment scenario is essentially the lived experience of almost all individual humans anyway." The median human already has no control over the economy, culture, or technology. Argues AI could improve quality of life even without human control, especially if humans retain a tiny fraction of cosmic resources.
An anonymous reviewer ("Thoughts on GD") argued that "the most convincing versions of gradual disempowerment either rely on misalignment or result in power concentration among humans, not total disempowerment." Questions whether GD is a distinct risk or just another lens on existing concerns.
80,000 Hours created a dedicated problem profile for Gradual Disempowerment, treating it as a distinct career-worthy area.
On the advocacy pivot:
No public criticism found. This silence is notable: if alignment talent is scarce, losing a productive PI should be costly. The absence of objection may indicate more private agreement with Krueger's pessimism than is publicly expressed.
On the CAIS statement (2023):
Critics included Timnit Gebru (hype from "the same people who have poured billions into these companies"), Emile Torres (motivated by "TESCREAL" ideologies), and Human Rights Watch (should focus on known risks rather than speculative future risks). Defenders argued the statement created common knowledge and was a necessary first step.
On the protest movement:
The Stop the AI Race protest coalition includes groups of varying reputation. Stop AI (distinct from Stop the AI Race) has been linked to violent threats and arrests. PauseAI maintains nonviolent principles. Krueger's Evitable aligns with the nonviolent wing but operates in a coalition where these groups overlap.
What's Absent
- No concrete theory of victory for Evitable's chip ban proposal. The endpoint is stated but no political roadmap from "public outrage" to "international chip production ban."
- No engagement with the political economy of chip bans (TSMC/Taiwan, ASML/Netherlands, massive economic and national security implications).
- No disclosed governance structure for Evitable (board, advisors, fiscal sponsor identity).
- No response to substantive GD criticisms from Beren, the "Thoughts on GD" author, or Zvi.
- No information about KASL students' situations during Krueger's leave. Several were mid-PhD.
- No measurable success criteria for Evitable.
- No evidence of Evitable policy impact yet (vs. ControlAI's 200+ lawmaker briefings).
Recommended Reading
David Krueger on The Inside View (2023) -- 3-hour interview transcript. The most candid, unfiltered source on Krueger's thinking. Covers: why he's pessimistic about technical alignment, why coordination matters more, how he ran his lab, what a solution might look like. Read this first. https://theinsideview.ai/david
"Gradual Disempowerment Might Not Be So Bad" by Beren Millidge (2025) -- The strongest counterargument to Krueger's signature idea. Argues the median human is already totally disempowered and the transition may not matter much in practice. https://www.beren.io/2025-11-23-Gradual-Disempowerment-Might-Not-Be-So-Bad/
Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development (2025) -- The paper providing the intellectual foundation for Evitable. Well-argued, seriously engages with the mechanisms by which aligned AI could still disempower humanity. https://gradual-disempowerment.ai/
"Rogue AI is already here" -- Fortune op-ed (March 2026) -- Krueger's most aggressive public position. Calls for global AI shutdown, attacks Anthropic for abandoning RSP commitments, declares alignment unsolvable. https://fortune.com/2026/03/27/rogue-ai-agents-autonomous-safety/
"The Risk of Gradual Disempowerment from AI" by Zvi Mowshowitz (2025) -- Prominent endorsement plus sharp analysis of why proposed mitigations are insufficient. Frames GD as the "Phase 2" problem. https://thezvi.substack.com/p/the-risk-of-gradual-disempowerment