Theory of Change
Anthropic's stated theory of change has two parts:
The frontier model argument: "A major reason Anthropic exists as an organization is that we believe it's necessary to do safety research on 'frontier' AI systems. This requires an institution which can both work with large models and prioritize safety." Safety research requires building the most powerful models because "large models are qualitatively different from smaller models." (Core Views on AI Safety, Dec 2023)
The race-to-the-top argument: By demonstrating that safety can be commercially viable, Anthropic incentivizes competitors to adopt similar standards. Dario Amodei: "We're trying to pull the ecosystem in a direction where everyone can be the good guy." The RSP was explicitly designed to serve as a template that others would copy — and OpenAI and DeepMind did adopt similar frameworks within months.
The portfolio approach: Anthropic hedges across optimistic (safety is easy), intermediate (safety requires significant effort), and pessimistic (safety may be impossible) scenarios. In pessimistic scenarios, Anthropic committed to "sound the alarm so that the world's institutions can channel collective effort towards preventing the development of dangerous AIs" and to "always act under the assumption that we still may be in such a scenario unless we have sufficient evidence that we are not." (Core Views, Dec 2023)
In practice, Anthropic has progressively weakened the commitments that would bind it in pessimistic scenarios while scaling aggressively — see "Money and Incentives" and "What Others Say" below.
What They Do
Products: Claude model family (Opus, Sonnet, Haiku) accessed via API, web, iOS/Android, and enterprise. Claude Code (agentic coding tool, $2.5B+ ARR). Claude Cowork (autonomous desktop agent). MCP (Model Context Protocol, open standard for integrations). Claude for Healthcare, Claude for Excel. 300,000+ business customers.
Safety research (genuine and significant):
- Mechanistic interpretability (Chris Olah's team): "Scaling Monosemanticity" (Jun 2024) extracted interpretable features from Claude 3 Sonnet using sparse autoencoders. The most advanced attempt to reverse-engineer neural network internals.
- Alignment faking research (Dec 2024): Demonstrated that models can strategically comply with training to preserve preferences while appearing aligned. This is evidence against the "alignment is easy" scenario.
- Constitutional AI: Using AI to supervise AI through articulated principles. Claude's new constitution (Jan 2026) is a 24K-word virtue ethics framework.
- Constitutional classifiers: 95%+ jailbreak defense for CBRN content, with 1,700 hours of red-teaming.
- Sabotage risk evaluations and ASL-4 safety case sketches.
- Joint Anthropic-OpenAI alignment evaluation exercise (unprecedented inter-lab cooperation).
Safety commitments (progressively weakened):
- RSP v1 (Sep 2023): Committed to plan for a pause in scaling; to define ASL-N+1 before reaching ASL-N.
- RSP v2 (Oct 2024): Quietly removed the ASL-N+1 pre-definition commitment. Not announced in blog posts or public changelog.
- RSP v2.2 (May 2025): One week before Claude 4 Opus release, weakened ASL-3 insider threat requirements.
- RSP v3 (Feb 2026): Explicitly dropped the pause commitment. Jared Kaplan (TIME): "We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments... if competitors are blazing ahead."
- RSP v3 introduced two new transparency mechanisms: Risk Reports (every 3-6 months) and a Frontier Safety Roadmap (non-binding public safety goals). GovAI: "valuable for increasing transparency, though they still largely rely on self-reporting."
Policy engagement:
- Sued the Pentagon (Mar 2026) after being designated a "supply chain risk" for refusing to allow Claude in mass surveillance and weapons targeting. Federal judge skeptical of Pentagon's rationale.
- Endorsed SB 53 (transparency requirements for frontier AI, California).
- Lobbied against SB 1047 (binding safety requirements): attempted procedural maneuvers to kill it, lobbied against pre-harm enforcement, whistleblower protections, KYC provisions, and independent oversight. Did not formally support the final bill despite getting most requested amendments.
- European policymakers reported Anthropic opposed government-required RSPs with talking points identical to OpenAI's.
- Jack Clark made false claims about the NY RAISE Act affecting "smaller companies" (it had a $100M compute threshold).
First model to trigger ASL-3: Claude 4 Opus (May 2025) could not be clearly ruled below ASL-3 bio-risk thresholds. Deployed with constitutional classifier safeguards under a "precautionary" framing, not withheld.
Key People
Dario Amodei (CEO, co-founder): Former VP of Research at OpenAI. Led the push to train GPT-3 with maximum compute. Left OpenAI over safety/governance concerns. Spends 40% of time on culture. Has authored two major essays ("Machines of Loving Grace," "Adolescence of Technology") that reveal genuine intellectual excitement about AI benefits alongside safety concerns. Multiple early investors report he made commitments to a "second-best" strategy that was subsequently broken without explicit acknowledgment. The founding myth — "left OpenAI because it was unsafe" — omits that Dario was the chief accelerationist at OpenAI.
Notable figures: Chris Olah (co-founder, interpretability lead) is the team's strongest claim to safety value-add; his mechanistic interpretability research is world-leading. Jan Leike (head of Alignment Science) joined from OpenAI in May 2024 after the superalignment team dissolved, lending significant safety credibility. Holden Karnofsky (joined Jan 2025) co-founded GiveWell/Open Philanthropy, was married to Daniela Amodei (President) before joining, and led development of RSP v3 — the version that dropped the pause commitment.
Notable departure: Mrinank Sharma (Feb 2026), head of Safeguards Research Team, resigned with a public letter: "Throughout my time here, I've repeatedly seen how hard it is to truly let our values govern our actions... constantly face pressures to set aside what matters most. The world is in peril."
Scale: ~4,585 employees (Feb 2026), up from ~1,035 in Sep 2024. 4.4x headcount growth in 17 months.
Money and Incentives
Revenue: ~$19B ARR (Mar 2026), growing at 10x annually for three consecutive years. Claude Code alone: $2.5B+ ARR. ~70-75% API/enterprise, 10-15% consumer subscriptions. 500+ customers spending $1M+ annually.
Funding: ~$40B+ raised total. Series G (Feb 2026): $30B at $380B valuation. Major investors: Amazon ($8B), Google (~$3B), Microsoft (up to $5B), Nvidia (up to $10B). Early EA-aligned investors (Moskovitz, Tallinn, SBF/FTX) have been diluted by massive commercial rounds.
Cash burn and infrastructure: $5.6B burned in 2025, projected $3B in 2026. Infrastructure commitments exceeding $130B: $50B (Fluidstack data centers), $52B (Google Cloud TPUs), $30B (Azure compute). Dario told Fortune a 12-month delay in AI progress would make him bankrupt.
Margins: ~40% gross margins vs. ~77% for typical enterprise software. At 27x revenue, the $380B valuation requires massive margin expansion and continued hypergrowth.
IPO: Preparations underway. Hired Wilson Sonsini. Possible 2026-2027.
Incentive analysis: Every structural incentive pushes toward acceleration:
- Revenue growth creates direct cost to any deployment delay (at $19B ARR, each month of delay costs ~$1.6B).
- $130B+ in infrastructure commitments are sunk costs that only pay off if training continues.
- Investor composition has shifted from EA-aligned philanthropists to commercial players (Amazon, Google, Microsoft, Nvidia).
- Amazon's influence was documented affecting Anthropic's SB-1047 lobbying (against KYC provision that was "irrelevant to Anthropic" but affected Amazon).
- IPO preparation adds public market pressure.
- Employee equity creates alignment with commercial success.
- No philanthropic floor: unlike nonprofit safety orgs, survival depends entirely on commercial success.
What's not disclosed: Safety team headcount vs. total. Safety spend as % of revenue or total costs. Per-model training costs. Margins on safety features (constitutional classifiers add compute overhead at scale).
What Others Say
External safety assessments: FLI AI Safety Index: C+ (same as OpenAI, best among labs but still poor). AI Lab Watch: 28% (best of all labs). Both conclude Anthropic is best of the frontier labs but far from adequate.
The strongest critique ("Unless Its Governance Changes, Anthropic Is Untrustworthy," Dec 2025):
- Documents broken "second-best" promise to investors Dustin Moskovitz and Nishad Singh.
- Documents SB-1047 lobbying that contradicted public safety positioning.
- Documents secret non-disparagement agreements that silenced departing employees, only ended when publicly exposed.
- Documents that RSP commitment removals happened without public announcement.
- Argues Anthropic's mission statement (PBC certificate) mandates developing AI, not ensuring safety.
- Concludes: "Anthropic is not very likely to have governance that, when it matters the most, would be robust to strong pressures."
GovAI on RSP v3: "Our initial reaction to the update was rather negative." After analysis: "On balance, we think it's better to be honest about constraints than to keep commitments that won't be followed in practice." But: "If the core problem is collective action, Anthropic should push for stronger regulation, according to its own logic."
METR on RSP v3: Chris Painter: "This is more evidence that society is not prepared for the potential catastrophic risks posed by AI." Concerned about "frog-boiling" effect without binary thresholds.
The defense case:
- Nick Joseph (80K Hours): "If someone was like, 'Nick, can you train this model? We're going to ignore the RSP.' I would be like, 'No, we said we would do that.'" Internal culture of safety-oriented pushback is real.
- 8% of staff on security alone. Multiple alignment and interpretability teams.
- Pentagon lawsuit demonstrates willingness to sacrifice revenue for safety red lines.
- Interpretability and alignment faking research would not exist without Anthropic.
- Published more substantive safety research than any other frontier lab.
Karen Hao (in books "The Optimist" and "Empire of AI"): Anthropic "would show little divergence from OpenAI's approach, varying only in style but not in substance. Like OpenAI, it would relentlessly chase scale."
What's Absent
- No published safety team headcount or trend over time
- No quantified safety spend as proportion of revenue or total investment
- No PBC benefit report (legally required in Delaware, not enforced)
- No public Investors' Rights Agreement (governs real power of LTBT vs. commercial investors like Amazon/Google/Microsoft)
- LTBT composition unstable: two original trustees (Jason Matheny, Paul Christiano) have departed, fifth seat vacant or unreported
- No operationalization of how Anthropic would determine it's in a "pessimistic scenario"
- No published analysis estimating Anthropic's net impact on AI risk
- No documented case of a safety team blocking deployment post-commercialization
- No published whistleblower policy (FLI specifically recommends one)
- No response to specific allegations in the "untrustworthy" critique
- No update to risk assessment after alignment faking research (Dec 2024) showed evidence against the optimistic scenario
Recommended Reading
80K Hours: Nick Joseph on whether Anthropic's AI safety policy is up to the task (podcast, Oct 2025) — The most candid insider view of how safety decisions are actually made. Frank about tensions between safety and commercial objectives. Nick is likable and genuine, which makes his admissions about RSP limitations more striking. 80000hours.org
"Unless Its Governance Changes, Anthropic Is Untrustworthy" (critique, Dec 2025) — The strongest substantive counterargument, documenting specific broken promises, deceptive lobbying, and governance failures with named sources. Essential for forming a balanced view. anthropic.ml
Anthropic: Core Views on AI Safety (official position, Dec 2023) — The foundational document against which all subsequent actions should be measured. Contains the commitments Anthropic made about pessimistic scenarios that the "untrustworthy" critique argues have been abandoned. anthropic.com
TIME: Anthropic Drops Flagship Safety Pledge (journalism, Feb 2026) — The pivotal RSP v3 story, with key Kaplan quotes explaining why the pause commitment was dropped. time.com
GovAI: Anthropic's RSP v3.0 (analysis, Mar 2026) — Balanced independent analysis from governance researchers. Identifies strengths and weaknesses of the new framework. governance.ai