Theory of Change
CAIF's theory of change rests on the argument that aligned AI is necessary but not sufficient for good outcomes. Founder Allan Dafoe articulates the core claim:
"Alignment is insufficient for good outcomes. And to make an even stronger claim, you could say it's not necessary to solve alignment to have good outcomes. [...] If we're globally coordinated, then we could just appoint a reasonable decision maker to make this risk calculus, and that would satisfy humanity's collective view on how we should develop." (80K Hours, 2025)
The causal chain: As AI agents are deployed at scale -- trading assets, advising military commanders, negotiating on behalf of individuals -- their interactions will generate risks qualitatively different from single-agent alignment failures. Miscoordination, conflict, and collusion among AI agents could cause flash crashes, arms race escalation, or institutional erosion. CAIF aims to advance "cooperative intelligence" (the skills needed for agents to solve cooperation problems) relative to dual-use capabilities (deception, coercion), following a differential progress model. The bet: by building the field of cooperative AI research now, safety-relevant cooperative skills arrive before or alongside dangerous multi-agent capabilities.
CAIF explicitly acknowledges the strongest counter-argument. Paul Christiano and Carl Shulman argue cooperative competence will scale automatically with general intelligence, making dedicated investment unnecessary. MIRI-adjacent thinkers argue the opposite: that making AI more socially skilled before solving alignment is dangerous, as it gives misaligned AI better tools for deceiving or cooperating against humans. CAIF bets against both alternatives.
What They Do
CAIF operates six main programs from a team of 6 employees:
Research grants: 15 grants awarded totaling approximately USD 3.5M to universities worldwide (CMU, Oxford, MIT, Stanford, Harvard, Cornell, Bonn, UCL, NYU, Michigan, Washington, Uppsala, and others). All grants are to academic institutions except CLR and SIPRI. In 2024, they funded only 3 proposals (GBP 660K) -- less than 50% of budgeted capacity. They restructured the program for 2025 with narrower scope definitions and a two-stage application process.
PhD Fellowship: 14 fellows (2026) and 16 fellows (2025) at top universities. ~10.7% acceptance rate from 150 applications. Provides financial support for PhD students researching cooperative AI.
Summer School: Annual (2023 UK, 2024 Santa Cruz, 2025 Marlow UK). 65+ attendees in 2025. 74% reported increased motivation to pursue cooperative AI careers (self-reported).
Contests: NeurIPS Concordia Contest 2024 (197 participants, 878 submissions) in collaboration with Google DeepMind. Earlier Melting Pot Contest 2023 (672 submissions, 117 teams).
Research output: Flagship report "Multi-Agent Risks from Advanced AI" (February 2025, 50+ co-authors) taxonomizes risks into three failure modes and seven risk factors. "Agent Properties for Safe Interactions" (December 2025) proposes shifting from scenario simulation to studying constituent agent properties. Both are theoretical/taxonomic -- "foundational not operational" (AIGL review).
Policy engagement: TechPolicy Press article identifying EU AI Act Article 73 blind spots for multi-agent incidents. US AI Action Plan RFI submission. Athens Roundtable participation. Contributions to International AI Safety Report 2026.
Partnerships and fellowships: PIBBSS Cooperative AI Track (up to 6 fellows/year), MATS sponsorship (2 scholars, USD 34K), Cape Town Research Fellowship (3 months, 10 fellows, January-April 2026) in partnership with UCT and AI Safety South Africa.
Key People
Allan Dafoe -- Founder, Trustee. Simultaneously serves as Director of Frontier Safety & Governance at Google DeepMind. Previously founding director of GovAI. PhD in political science from Yale (technological determinism, great power conflict). Left GovAI in 2021 to advise Demis Hassabis "from inside the company."
Lewis Hammond -- Research Director (previously Acting ED). DPhil candidate at Oxford. Lead author of the flagship Multi-Agent Risks report. Reviewer for the International AI Safety Report 2026.
David Norman -- Managing Director (recently hired). Background in London Initiative for Safe AI, WWF, SABMiller.
The 5-person board includes: Gillian Hadfield (Chair, legal scholar, Johns Hopkins), Thore Graepel (Google DeepMind Distinguished Research Scientist, UCL Chair of ML, AlphaGo contributor), Jesse Clifton (former ED of Center on Long-Term Risk, connected to sole funder Macroscopic Ventures), and Audrey Tang (Taiwan's Cyber Ambassador, former Digital Minister).
Money and Incentives
Total budget: Annual income of GBP 1.97-2.42M (approximately USD 2.5-3M), funded by a single $15M endowment from Macroscopic Ventures.
Revenue breakdown: 100% from Macroscopic Ventures. No government income. No Coefficient Giving/Open Philanthropy grants. No other donors identified. No investment income. No earned revenue.
Expenditure pattern:
- 2022: GBP 91.5K (startup year)
- 2023: GBP 2.35M (GBP 1.80M grants to institutions)
- 2024: GBP 1.73M (GBP 750K grants to institutions)
- Assets: GBP 4.01M (approximately 2 years runway at current spend)
Business model: Pure grantmaker and field-builder. No products, services, or commercial revenue. Entirely dependent on single philanthropic commitment.
Funder: Macroscopic Ventures (formerly Center for Emerging Risk Research/CERR) has a broader mission: "help build a world guided by reason and compassion for all sentient beings." Focus areas include AI welfare, s-risk reduction, cooperative AI, reducing risks from fanatical ideologies, and animal welfare. Plans up to $100M in annual grantmaking. Also funds CLR ($3M separately to FOCAL/CMU), IAPS, Rethink Priorities, and others.
Incentive analysis: The single-funder structure creates perfect alignment with Macroscopic Ventures' values but zero institutional independence. Two of five trustees work at Google DeepMind. One trustee (Jesse Clifton, who also serves as "Grantmaking Officer") is connected to the sole funder via CLR/Macroscopic Ventures. No evidence CAIF has published anything critical of DeepMind. No publicly disclosed conflict-of-interest procedures for the specific DeepMind/funder relationships.
Salary: 1 employee earns GBP 100-110K. Community Manager role advertised at GBP 45-55K. Modest by AI industry standards.
What Others Say
Self-criticism (CAIF): The 2025 grants update is the most substantive critical assessment available. CAIF admitted: "A large majority of the proposals that we received in 2024 were out of scope. Even among the proposals that were in scope, a large fraction were not properly aligned with our grantmaking priorities. This points to a failure in communication on our part." They under-spent their budget by over 50% and acknowledged that detailed rejection feedback "was often taken as encouragement by applicants and led to resubmissions also for proposals that were very unlikely to be funded."
External review (AIGL): The Multi-Agent Risks report is "rich in concepts" but "offers limited practical tooling or metrics for assessing multi-agent risk in deployed systems." "More foundational than operational."
Christiano/Shulman counter-argument (per Dafoe): The "super-cooperative AGI hypothesis" holds that cooperative competence scales automatically with intelligence. If true, CAIF's dedicated investment is unnecessary. Dafoe acknowledges this is "an important hypothesis to really think through" and does not dismiss it.
MIRI-adjacent counter-argument (per Dafoe): Some in the safety community "prefer the latter" world: "AI is relatively not that capable... but very good at cooperative skill" is worse than "extremely superhuman at material science, but amateur at cooperative skill." Making AI socially skilled before ensuring alignment gives it tools to outwit human overseers.
No independent criticism found: Despite extensive searching (35+ queries), no external critic has published a substantive challenge to CAIF's approach, theory of change, or effectiveness. Two possible explanations: (1) the organization is too new and small to attract critics, or (2) cooperative AI occupies a conceptual space that is hard to argue against without appearing to oppose cooperation itself.
What's Absent
- No independent evaluation of grant impact or research outcomes
- No public conflict-of-interest disclosure for DeepMind trustees' role in CAIF decisions affecting DeepMind-connected research
- No information on funder governance or endowment drawdown structure
- No evidence that CAIF's multi-agent risk analysis has changed any lab's deployment practices
- No key personnel departures or former employee/advisor public statements
- No Wikipedia article; minimal community discussion (2 relevant forum posts across LW/EAF/AF)
Recommended Reading
Allan Dafoe on 80,000 Hours (2025) -- The most candid source on CAIF's intellectual foundations. Dafoe explains technological determinism, why cooperative AI matters as much as alignment, acknowledges the strongest counter-arguments, and describes his dual role at DeepMind. Start here. https://80000hours.org/podcast/episodes/allan-dafoe-unstoppable-technology-human-agency-agi/
"Updates to the CAIF Grant Program in 2025" -- Remarkably honest internal assessment of grantmaking failures. The most revealing document CAIF has published. https://www.cooperativeai.com/post/updates-to-the-caif-grant-program-in-2025
CLR Research Agenda: Cooperation, Conflict, and Transformative AI -- The intellectual ancestor of CAIF. Written by Jesse Clifton (now CAIF trustee). Reveals the s-risk/suffering-reduction motivation behind the funding chain. https://longtermrisk.org/research-agenda/
Macroscopic Ventures Focus Areas -- Essential for understanding the funder's worldview: s-risk reduction, AI welfare, "reason and compassion for all sentient beings." The values behind the $15M endowment. https://macroscopic.org/focus-areas