Theory of Change
CHAI's theory of change has two levels. At the technical level, Russell argues that the "standard model" of AI -- giving machines fixed objectives to optimize -- is fundamentally unsafe. His proposed replacement: machines should be uncertain about human preferences and learn them through observation (Cooperative Inverse Reinforcement Learning / CIRL). In his words: "unless we, in some sense, rip out everything we know about AI and start again and do things in this different way, then things are heading in the wrong direction."
At the institutional level, CHAI aims to demonstrate this paradigm shift through peer-reviewed research, train a generation of safety-minded researchers who carry these ideas into industry, and leverage Russell's personal stature to shape global AI policy.
Russell's three principles for beneficial AI:
- The machine's only objective is to maximize the realization of human preferences.
- The machine is initially uncertain about what those preferences are.
- The ultimate source of information about human preferences is human behavior.
The causal chain: Develop the CIRL framework and adjacent theory -> Train researchers who internalize the paradigm -> Those researchers carry it into DeepMind, Anthropic, OpenAI, and academia -> Russell personally advocates for regulatory structures -> The field gradually shifts from fixed objectives to provably beneficial AI.
What They Do
Research: 32+ papers per year (2022-2023 period). Key areas: assistance games, adversarial robustness, recommender systems, multi-agent cooperation, social impacts. Major result: demonstrated that simple adversarial strategies can beat superhuman Go AIs (ICML-23), illustrating that deep learning systems don't truly "understand" their domains. Foundational CIRL paper (NeurIPS 2016) introduced the formal framework. "Towards Guaranteed Safe AI" (2024) represents a more implementation-oriented evolution: world model + safety specification + verifier.
Talent pipeline: This may be CHAI's most impactful output. Alumni now hold influential positions across the safety ecosystem: Anca Dragan (Head of AI Safety, Google DeepMind), Rohin Shah (AGI Safety lead, DeepMind), Dylan Hadfield-Menell (MIT faculty), Adam Gleave (FAR.AI founder), Rosie Campbell (ex-OpenAI policy), Scott Emmons (Anthropic), Sam Toyer (OpenAI Model Safety), Lawrence Chan (METR). Four PhD completions in the 2022-23 period.
Policy engagement: Russell personally engaged with an extraordinary breadth of policymakers in 2022-2023: UK 10 Downing Street, US Senate (Schumer, Blumenthal, Heinrich), EU AI Act negotiators (instrumental in classifying recommender systems as "high-risk"), France, Singapore, Argentina, Netherlands, China, WEF, OECD, UNESCO, GPAI. Co-authored the FLI "Pause Giant AI Experiments" open letter (March 2023). Senate testimony (July 2023) calling for an FDA-like AI regulatory agency.
Institution-building: IASEAI (International Association for Safe and Ethical AI) founded 2025, inaugural Paris conference: ~700 in-person, 1400 online, keynote by Hinton. Annual CHAI workshops at Asilomar (8th in 2024, 200+ attendees). NSF PSBAI workshop (2022, first government-funded). "Slaughterbots" anti-autonomous-weapons video (70M+ views).
Open source: Modest. GitHub repos include overcooked_ai, tensor-trust, imitation, ranking-challenge. Jonathan Stray's Prosocial Ranking Challenge is the most notable applied project.
Key People
Stuart Russell -- Faculty Director, Founder. Author of the standard AI textbook (AIMA, 1500+ universities). Author of Human Compatible (2019). Professor at UC Berkeley since 1986. Born 1962, British. BBC Reith Lecturer 2021, TIME100 AI 2025, OBE, Fellow of the Royal Society 2025. Also directs the Kavli Center, co-chairs OECD and WEF AI groups, and serves as IASEAI President. Russell IS CHAI -- his personal stature and network are the org's primary asset and its primary vulnerability.
Mark Nitzberg -- Executive Director. Manages day-to-day operations. Computer vision background, Wilson Center fellow. More operationally focused than Russell.
Anca Dragan -- Co-PI, on leave since early 2024 to head AI Safety and Alignment at Google DeepMind. Co-author of the foundational CIRL paper. Her departure is the most significant leadership change in CHAI's history.
Team size: ~9 faculty investigators, 18 affiliate faculty, ~30 graduate/postdoctoral researchers, ~25 PhD students, ~5 staff, ~7 interns/year (Founders Pledge estimate).
Money and Incentives
Total confirmed funding: ~$20.6M from Open Philanthropy (2016-2024), delivered through two channels:
- $16.9M via UC Berkeley (founding grant $5.56M in 2016 + renewal $11.36M in 2021, both 5-year grants)
- $3.54M via BERI (ML engineers, compute cluster, internships, operational support, 2017-2024)
Estimated annual budget: ~$3M/year (Founders Pledge). This likely undercounts total resources, since UC Berkeley subsidizes faculty salaries, lab space, IT, and benefits.
Business model: University research center funded primarily by philanthropic grants, with university providing base infrastructure. PhD students funded through standard university mechanisms (TAships, fellowships). External grants supplement rather than replace university support.
Funder concentration: Open Philanthropy is overwhelmingly dominant (~$20.6M). SFF has provided ~$898K. Other listed sponsors (FLI, Leverhulme Trust, CITRIS, NSF) have smaller or unknown amounts. Individual donations accepted via Every.org and Berkeley Big Give. No separate 990 filing -- finances buried in UC Berkeley's consolidated reporting.
BERI as intermediary: BERI (501(c)(3), EIN 81-4820272) serves as fiscal intermediary, providing operational flexibility that UC Berkeley's bureaucracy cannot. BERI hires staff, manages compute, handles logistics. Andrew Critch co-founded BERI in 2017 and works at CHAI part-time.
Incentive analysis: Academic incentives (publications, tenure, prestige) align with producing rigorous theoretical work but may favor publishability over practical safety impact. The talent pipeline to industry is a feature (field-building) but also a constraint (CHAI can't retain top researchers against industry compensation). Russell has no visible financial conflicts with AI labs -- he is a tenured professor without equity stakes, board seats, or consulting ties to frontier labs. This genuine independence is rare among influential AI safety figures.
Key incentive risk: Extreme funder concentration on OP creates dependency. If OP's priorities shifted (e.g., toward more empirical/applied safety work and away from theoretical frameworks), CHAI's funding could be at risk. However, UC Berkeley embedding provides a floor.
What Others Say
The strongest technical critique (MIRI, via Scott Alexander): CIRL's corrigibility depends on the AI being uncertain about human preferences. But once the AI learns enough, uncertainty decreases, and with it the reason to defer to humans. The AI has a "sixth option": refuse shutdown, keep learning, then optimize sovereignly. Russell responds that proper Bayesian priors should prevent absolute certainty. MIRI counters that the issue isn't absolute certainty but the point where expected information value drops below cost -- then the AI acts regardless. Scott Alexander: "their crux seems to be whether the AI could end up with an uncorrectably wrong model of the human utility function."
IRL scalability (Kasenberg): Three fundamental limitations: (1) moral norms are temporally complex, but IRL assumes reward depends only on current state; (2) reward functions are domain-specific and don't transfer; (3) IRL outputs are opaque numbers, not interpretable principles.
"Dumb superintelligence" fallacy (Melanie Mitchell): True intelligence inherently involves common sense, adaptability, and context-sensitivity. A superintelligent entity that simultaneously lacks basic understanding is a contradiction. The paperclip maximizer scenario may be incoherent.
"Too early" (Robin Hanson): Russell hasn't made the case for reorganizing all of AI now. We are too far from knowing how future AI systems will be organized. Like warning about nuclear weapons in 1500.
"Blinkered" rationality (David Leslie, Nature review): Russell reduces intelligence to instrumental rationality and falls prey to "techno-solutionism." His framework ignores holistic, contextual understanding of reasoning.
Positive assessments: Open Philanthropy rates CHAI "one of the highest-impact organizations working on AI alignment in the world." Founders Pledge assessment is broadly positive: "CHAI is especially well-placed to produce reliably positive impact."
What's Absent
- No financial transparency beyond grant amounts. No separate 990, no published budget breakdown.
- No succession plan. Russell is 63 with no visible heir. Dragan (most plausible successor) left for DeepMind.
- No published impact evaluation. After 10 years and $20M+, no systematic assessment of whether CIRL has been adopted, whether alumni have changed practices at labs, or whether policy advocacy produced specific outcomes.
- No evidence of CIRL adoption in production. The framework is influential conceptually but there is no evidence it has been implemented in any AI product or system.
- No progress report since May 2023. A two-year gap in public reporting during AI safety's most critical period.
- Limited engagement with LLM alignment. CHAI's core program was designed for a different AI paradigm. RLHF, constitutional AI, and other LLM-specific techniques are not prominently represented in their publications.
- Minimal independent evaluation. The strongest endorsements come from within the OP/EA funder ecosystem. External academic evaluations of CHAI's approach are largely absent.
Recommended Reading
Stuart Russell on Diary of a CEO (2025) -- Russell at his most emotionally urgent. Reports private conversations with AI CEOs about extinction risk. Explains the gorilla problem, fast takeoff, and why he's "appalled." The most candid window into his current thinking. https://singjupost.com/stuart-russell-on-the-diary-of-a-ceo-podcast-transcript/
Scott Alexander, "CHAI, Assistance Games, And Fully-Updated Deference" -- Both explains CHAI's approach clearly AND presents the strongest technical counterargument. Required reading. https://www.astralcodexten.com/p/chai-assistance-games-and-fully-updated
Stuart Russell on 80K Hours (#80, 2020) -- The comprehensive intellectual foundation: standard model critique, three principles, counterarguments, policy ideas. More detailed than any other single source on Russell's worldview. https://80000hours.org/podcast/episodes/stuart-russell-human-compatible-ai/
Robin Hanson review of Human Compatible -- The skeptic's case: it's too early, the ask is too large, and Russell hasn't engaged with alternatives. https://www.overcomingbias.com/p/russells-human-compatiblehtml
Founders Pledge assessment of CHAI -- The most thorough independent evaluation available, including team size, research output, and field-building assessment. https://www.founderspledge.com/research/center-for-human-compatible-ai