Theory of Change
CAIS articulates its mission as reducing "societal-scale risks associated with AI" through three pillars: research, field-building, and advocacy.
Their most comprehensive risk framework identifies four categories: malicious use (bioterrorism, autonomous agents), AI race dynamics (military and corporate arms races, evolutionary pressures), organizational failures (accidents, safety culture), and rogue AI (proxy gaming, deception, power-seeking). Each has proposed interventions spanning technical research, regulation, and international coordination.
The theory has evolved significantly. Early CAIS (2022-2023) focused on technical safety research and public awareness (the extinction risk statement). By 2025, the primary contribution is the Superintelligence Strategy paper (co-authored with Eric Schmidt and Alexandr Wang), which proposes MAIM -- Mutual Assured AI Malfunction -- a deterrence regime where states sabotage each other's destabilizing AI projects. Components: deterrence through sabotage threats, nonproliferation to keep weaponizable capabilities from rogue actors, and competitiveness through economic and military AI development.
Hendrycks frames this shift in a March 2025 interview: "I don't think labs have an extremely large role in safety overall... They're kind of predetermined to race... can't really choose not to. Safety is much more of a broader problem. It's got some technical aspects, but I think that's a small part of it." He distinguishes alignment from safety: "China can have AIs that are totally aligned with them. The US can have AIs that are totally aligned with them. You still are going to have a strategic competition."
In the Lawfare Daily transcript (March 2025), Hendrycks describes specific mechanisms he has been socializing in Washington: CIA espionage cells monitoring rival AI programs, CyberCom preparing attacks on adversary data centers, moving data centers outside cities for "city avoidance."
What They Do
Research. Strong publication record for a 3-year-old org with ~10-15 staff. Notable outputs: Circuit Breakers (NeurIPS 2024 -- required 20,000 jailbreak attempts to bypass), WMDP benchmark (ICML 2024), HarmBench (adopted by US and UK AI Safety Institutes for pre-deployment testing), safetywashing paper (NeurIPS 2024, showing most safety benchmarks correlate with general capabilities), Tamper-Resistant Safeguards for open-weight models, Representation Engineering (RepE), Humanity's Last Exam (co-created with Scale AI, 3,000+ expert contributors). Hendrycks also created MMLU -- the most widely used AI capability benchmark -- before founding CAIS. He explicitly opposes mechanistic interpretability in a published article, arguing RepE and circuit breakers are more practical than Anthropic-style neuron-level analysis.
Compute cluster. 80 A100 GPUs supporting ~350 researchers, enabling 109 cumulative papers with 4,000+ citations. Now restricted to Schmidt Sciences AI safety grantees. Arguably CAIS's highest-impact program by volume of output.
Advocacy. The one-sentence Statement on AI Risk (May 2023) -- signed by Altman, Amodei, Hassabis, Hinton, Bengio, and 1,000+ others -- mainstreamed extinction risk. CAIS Action Fund co-sponsored SB 1047 in California, building a coalition of 70+ academics, 77% of CA voters, unions, before Newsom's veto. DC launch event (July 2024) with bipartisan congressional keynotes. $270K in federal lobbying in 2024. Secured $10M congressional funding for the US AI Safety Institute. Hendrycks currently socializing MAIM at the White House.
Field-building. ML Safety course (1,000+ participants), Philosophy Fellowship (7 months, 18 papers), AI & Society Fellowship (3 months, economists/lawyers/IR scholars), AI Safety textbook (Taylor & Francis), SafeBench competition ($250K prizes), AI Safety Newsletter with 43,000+ subscribers.
Key People
Dan Hendrycks -- Executive Director. Born 1994/95, Marshfield MO. Evangelical upbringing shaped his moral catastrophism. PhD UC Berkeley 2022. Created GELU and MMLU. TIME100 AI 2023. Advises xAI ($1/year) and Scale AI ($12/year), no equity. Co-authored Superintelligence Strategy with Eric Schmidt and Alexandr Wang. Co-founded Gray Swan AI but divested August 2024 amid conflict-of-interest criticism. p(doom) reportedly peaked at >80%, now ~50-50. Compensation: $314,534 (2024). Credits 80,000 Hours for career direction but distances from EA: "AI safety has outgrown the EA community."
Nick Beckstead -- Policy Lead (2024, $198K). Formerly CEO of FTX Future Fund and Open Philanthropy program officer. Has since departed to found the Secure AI Project. His path through the EA funding ecosystem is notable.
Jaan Tallinn -- Skype co-founder, primary funder of SFF (which gave CAIS ~$2.8M in 2024), appears in a governance-adjacent role. Creates funder-governance overlap.
Team size: ~10-15 FTEs based on salary data. No public evidence of staff departures in 3+ years.
Money and Incentives
Total budget. 2024 revenue: $10.2M (contributions $9.6M, investment $384K, program services $257K). 2024 expenses: $7.1M. Total assets: $12.6M. Revenue peaked at $16.1M in 2023 during Open Phil grants. 93.8% dependent on contributions.
Funding sources:
- FTX (2022): $6.5M received May-September 2022. Bankruptcy estate sought clawback; CAIS refused voluntary accounting. Resolution unknown; liabilities dropped from $5.45M (2023) to $1.03M (2024).
- Open Philanthropy (2022-2023): 4 grants totaling $12.49M. The October 2023 grant labeled "exit grant" with "approximately one year of operational support." No further OP funding. No public explanation for ending the $12.5M relationship.
- SFF / Jaan Tallinn (2024-2025): ~$2.8M in 2024, ~$1.8M in 2025.
- Schmidt Sciences: In-kind compute (80 A100 cluster restricted to Schmidt grantees), $10M AI Safety Science program. Schmidt co-authored flagship paper.
- Unknown: ~$6-7M of 2024 contributions from unidentified sources.
Early funding was almost entirely EA-affiliated: $12.5M from Open Phil + $6.5M from FTX = ~$19M. CAIS now actively repositions toward the national security establishment while publicly distancing from EA.
Incentive structure. Hendrycks advises xAI and Scale AI for nominal pay. Schmidt Sciences provides compute and co-authored the flagship policy paper. Gray Swan AI (co-founded by Hendrycks, divested 2024) would benefit commercially from safety mandates CAIS advocates. These relationships create structural incentives even when financial conflicts are minimized: Hendrycks' credibility as a safety voice depends on industry connections, while those companies gain safety legitimacy from his association.
What Others Say
Strongest case against MAIM: MIRI identifies 5 conditions MAIM must meet for deterrence and argues it falls short -- "breakout distance" between acceptable AI use and decisive strategic advantage is too short, monitoring is impractical, sabotage can only delay (not deny). IAPS estimates only ~25% chance of MAIM dynamics actually occurring. The AI Frontiers observability analysis argues the US and China fundamentally cannot monitor each other's AI development -- you can't deter what you can't detect.
Zvi Mowshowitz calls MAIM "not crazy" but notes "our planetary track record of following through in even the most obvious of situations is highly spotty."
Extinction statement critics argue tech leaders signing the statement benefit from inflated perceptions of AI power. Timnit Gebru called it a "DDoS attack on attention." A Harvard Data Science Review article characterizes the extinction narrative as "a bid for power" using dramatic forms from Greek tragedy.
xAI credibility problem. Hendrycks advises xAI, which released Grok 4 in July 2025 without any safety report despite Seoul summit commitments. Anthropic's Samuel Marks called this "reckless." When the Director of the Center for AI Safety advises a company that does not follow industry safety standards, the gap between advocacy and practice is impossible to ignore.
Even evals critics validate CAIS. A LessWrong post arguing AI evaluation regimes are harmful cites the CAIS extinction statement as "more useful than eval results."
What's Absent
No documented conflict-of-interest policy despite Hendrycks' advisory roles at xAI and Scale AI, Gray Swan co-founding, and Tallinn's funder-governance overlap. No public explanation for why Open Philanthropy ended funding after $12.5M. Approximately $6-7M of 2024 funding from unidentified sources. No evidence of independent board oversight. No staff departures with public statements in 3+ years. The specifics of Hendrycks' xAI advisory role -- hours, influence, actual impact -- are undocumented. Co-founder Oliver Zhang's current role is invisible.
Recommended Reading
No Priors podcast with Hendrycks (March 2025) -- Most candid articulation of why he thinks safety is geopolitical, not technical. https://podscripts.co/podcasts/no-priors-artificial-intelligence-technology-startups/national-security-strategy-and-ai-evals-on-the-eve-of-superintelligence-with-dan-hendrycks
MIRI's "Refining MAIM" critique (April 2025) -- Best-argued case that CAIS's central policy proposal has fundamental implementation flaws. https://intelligence.org/2025/04/11/refining-maim-identifying-changes-required-to-meet-conditions-for-deterrence/
Fortune: xAI releases Grok 4 with no safety report (July 2025) -- The credibility gap. https://fortune.com/2025/07/17/elon-musk-xai-grok-4-no-safety-report/
Boston Globe profile of Hendrycks (July 2023) -- Evangelical upbringing, 80K Hours influence, founding CAIS. https://www.bostonglobe.com/2023/07/06/opinion/ai-safety-human-extinction-dan-hendrycks-cais/
Hendrycks on mechanistic interpretability (May 2025) -- CAIS's intellectual position against Anthropic's largest research bet. https://ai-frontiers.org/articles/the-misguided-quest-for-mechanistic-ai-interpretability