← AI Safety Orgs

Anthropic

Frontier Lab

Safety-first lab. RSP. Constitutional AI. The benchmark.

Founded
2021
HQ
San Francisco, CA
Team
4,585
Structure
PBC
Model
Product Revenue

Theory of Change

Anthropic's stated theory of change has two parts:

The frontier model argument: "A major reason Anthropic exists as an organization is that we believe it's necessary to do safety research on 'frontier' AI systems. This requires an institution which can both work with large models and prioritize safety." Safety research requires building the most powerful models because "large models are qualitatively different from smaller models." (Core Views on AI Safety, Dec 2023)

The race-to-the-top argument: By demonstrating that safety can be commercially viable, Anthropic incentivizes competitors to adopt similar standards. Dario Amodei: "We're trying to pull the ecosystem in a direction where everyone can be the good guy." The RSP was explicitly designed to serve as a template that others would copy — and OpenAI and DeepMind did adopt similar frameworks within months.

The portfolio approach: Anthropic hedges across optimistic (safety is easy), intermediate (safety requires significant effort), and pessimistic (safety may be impossible) scenarios. In pessimistic scenarios, Anthropic committed to "sound the alarm so that the world's institutions can channel collective effort towards preventing the development of dangerous AIs" and to "always act under the assumption that we still may be in such a scenario unless we have sufficient evidence that we are not." (Core Views, Dec 2023)

In practice, Anthropic has progressively weakened the commitments that would bind it in pessimistic scenarios while scaling aggressively — see "Money and Incentives" and "What Others Say" below.

What They Do

Products: Claude model family (Opus, Sonnet, Haiku) accessed via API, web, iOS/Android, and enterprise. Claude Code (agentic coding tool, $2.5B+ ARR). Claude Cowork (autonomous desktop agent). MCP (Model Context Protocol, open standard for integrations). Claude for Healthcare, Claude for Excel. 300,000+ business customers.

Safety research (genuine and significant):

  • Mechanistic interpretability (Chris Olah's team): "Scaling Monosemanticity" (Jun 2024) extracted interpretable features from Claude 3 Sonnet using sparse autoencoders. The most advanced attempt to reverse-engineer neural network internals.
  • Alignment faking research (Dec 2024): Demonstrated that models can strategically comply with training to preserve preferences while appearing aligned. This is evidence against the "alignment is easy" scenario.
  • Constitutional AI: Using AI to supervise AI through articulated principles. Claude's new constitution (Jan 2026) is a 24K-word virtue ethics framework.
  • Constitutional classifiers: 95%+ jailbreak defense for CBRN content, with 1,700 hours of red-teaming.
  • Sabotage risk evaluations and ASL-4 safety case sketches.
  • Joint Anthropic-OpenAI alignment evaluation exercise (unprecedented inter-lab cooperation).

Safety commitments (progressively weakened):

  • RSP v1 (Sep 2023): Committed to plan for a pause in scaling; to define ASL-N+1 before reaching ASL-N.
  • RSP v2 (Oct 2024): Quietly removed the ASL-N+1 pre-definition commitment. Not announced in blog posts or public changelog.
  • RSP v2.2 (May 2025): One week before Claude 4 Opus release, weakened ASL-3 insider threat requirements.
  • RSP v3 (Feb 2026): Explicitly dropped the pause commitment. Jared Kaplan (TIME): "We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments... if competitors are blazing ahead."
  • RSP v3 introduced two new transparency mechanisms: Risk Reports (every 3-6 months) and a Frontier Safety Roadmap (non-binding public safety goals). GovAI: "valuable for increasing transparency, though they still largely rely on self-reporting."

Policy engagement:

  • Sued the Pentagon (Mar 2026) after being designated a "supply chain risk" for refusing to allow Claude in mass surveillance and weapons targeting. Federal judge skeptical of Pentagon's rationale.
  • Endorsed SB 53 (transparency requirements for frontier AI, California).
  • Lobbied against SB 1047 (binding safety requirements): attempted procedural maneuvers to kill it, lobbied against pre-harm enforcement, whistleblower protections, KYC provisions, and independent oversight. Did not formally support the final bill despite getting most requested amendments.
  • European policymakers reported Anthropic opposed government-required RSPs with talking points identical to OpenAI's.
  • Jack Clark made false claims about the NY RAISE Act affecting "smaller companies" (it had a $100M compute threshold).

First model to trigger ASL-3: Claude 4 Opus (May 2025) could not be clearly ruled below ASL-3 bio-risk thresholds. Deployed with constitutional classifier safeguards under a "precautionary" framing, not withheld.

Key People

Dario Amodei (CEO, co-founder): Former VP of Research at OpenAI. Led the push to train GPT-3 with maximum compute. Left OpenAI over safety/governance concerns. Spends 40% of time on culture. Has authored two major essays ("Machines of Loving Grace," "Adolescence of Technology") that reveal genuine intellectual excitement about AI benefits alongside safety concerns. Multiple early investors report he made commitments to a "second-best" strategy that was subsequently broken without explicit acknowledgment. The founding myth — "left OpenAI because it was unsafe" — omits that Dario was the chief accelerationist at OpenAI.

Notable figures: Chris Olah (co-founder, interpretability lead) is the team's strongest claim to safety value-add; his mechanistic interpretability research is world-leading. Jan Leike (head of Alignment Science) joined from OpenAI in May 2024 after the superalignment team dissolved, lending significant safety credibility. Holden Karnofsky (joined Jan 2025) co-founded GiveWell/Open Philanthropy, was married to Daniela Amodei (President) before joining, and led development of RSP v3 — the version that dropped the pause commitment.

Notable departure: Mrinank Sharma (Feb 2026), head of Safeguards Research Team, resigned with a public letter: "Throughout my time here, I've repeatedly seen how hard it is to truly let our values govern our actions... constantly face pressures to set aside what matters most. The world is in peril."

Scale: ~4,585 employees (Feb 2026), up from ~1,035 in Sep 2024. 4.4x headcount growth in 17 months.

Money and Incentives

Revenue: ~$19B ARR (Mar 2026), growing at 10x annually for three consecutive years. Claude Code alone: $2.5B+ ARR. ~70-75% API/enterprise, 10-15% consumer subscriptions. 500+ customers spending $1M+ annually.

Funding: ~$40B+ raised total. Series G (Feb 2026): $30B at $380B valuation. Major investors: Amazon ($8B), Google (~$3B), Microsoft (up to $5B), Nvidia (up to $10B). Early EA-aligned investors (Moskovitz, Tallinn, SBF/FTX) have been diluted by massive commercial rounds.

Cash burn and infrastructure: $5.6B burned in 2025, projected $3B in 2026. Infrastructure commitments exceeding $130B: $50B (Fluidstack data centers), $52B (Google Cloud TPUs), $30B (Azure compute). Dario told Fortune a 12-month delay in AI progress would make him bankrupt.

Margins: ~40% gross margins vs. ~77% for typical enterprise software. At 27x revenue, the $380B valuation requires massive margin expansion and continued hypergrowth.

IPO: Preparations underway. Hired Wilson Sonsini. Possible 2026-2027.

Incentive analysis: Every structural incentive pushes toward acceleration:

  • Revenue growth creates direct cost to any deployment delay (at $19B ARR, each month of delay costs ~$1.6B).
  • $130B+ in infrastructure commitments are sunk costs that only pay off if training continues.
  • Investor composition has shifted from EA-aligned philanthropists to commercial players (Amazon, Google, Microsoft, Nvidia).
  • Amazon's influence was documented affecting Anthropic's SB-1047 lobbying (against KYC provision that was "irrelevant to Anthropic" but affected Amazon).
  • IPO preparation adds public market pressure.
  • Employee equity creates alignment with commercial success.
  • No philanthropic floor: unlike nonprofit safety orgs, survival depends entirely on commercial success.

What's not disclosed: Safety team headcount vs. total. Safety spend as % of revenue or total costs. Per-model training costs. Margins on safety features (constitutional classifiers add compute overhead at scale).

What Others Say

External safety assessments: FLI AI Safety Index: C+ (same as OpenAI, best among labs but still poor). AI Lab Watch: 28% (best of all labs). Both conclude Anthropic is best of the frontier labs but far from adequate.

The strongest critique ("Unless Its Governance Changes, Anthropic Is Untrustworthy," Dec 2025):

  • Documents broken "second-best" promise to investors Dustin Moskovitz and Nishad Singh.
  • Documents SB-1047 lobbying that contradicted public safety positioning.
  • Documents secret non-disparagement agreements that silenced departing employees, only ended when publicly exposed.
  • Documents that RSP commitment removals happened without public announcement.
  • Argues Anthropic's mission statement (PBC certificate) mandates developing AI, not ensuring safety.
  • Concludes: "Anthropic is not very likely to have governance that, when it matters the most, would be robust to strong pressures."

GovAI on RSP v3: "Our initial reaction to the update was rather negative." After analysis: "On balance, we think it's better to be honest about constraints than to keep commitments that won't be followed in practice." But: "If the core problem is collective action, Anthropic should push for stronger regulation, according to its own logic."

METR on RSP v3: Chris Painter: "This is more evidence that society is not prepared for the potential catastrophic risks posed by AI." Concerned about "frog-boiling" effect without binary thresholds.

The defense case:

  • Nick Joseph (80K Hours): "If someone was like, 'Nick, can you train this model? We're going to ignore the RSP.' I would be like, 'No, we said we would do that.'" Internal culture of safety-oriented pushback is real.
  • 8% of staff on security alone. Multiple alignment and interpretability teams.
  • Pentagon lawsuit demonstrates willingness to sacrifice revenue for safety red lines.
  • Interpretability and alignment faking research would not exist without Anthropic.
  • Published more substantive safety research than any other frontier lab.

Karen Hao (in books "The Optimist" and "Empire of AI"): Anthropic "would show little divergence from OpenAI's approach, varying only in style but not in substance. Like OpenAI, it would relentlessly chase scale."

What's Absent

  • No published safety team headcount or trend over time
  • No quantified safety spend as proportion of revenue or total investment
  • No PBC benefit report (legally required in Delaware, not enforced)
  • No public Investors' Rights Agreement (governs real power of LTBT vs. commercial investors like Amazon/Google/Microsoft)
  • LTBT composition unstable: two original trustees (Jason Matheny, Paul Christiano) have departed, fifth seat vacant or unreported
  • No operationalization of how Anthropic would determine it's in a "pessimistic scenario"
  • No published analysis estimating Anthropic's net impact on AI risk
  • No documented case of a safety team blocking deployment post-commercialization
  • No published whistleblower policy (FLI specifically recommends one)
  • No response to specific allegations in the "untrustworthy" critique
  • No update to risk assessment after alignment faking research (Dec 2024) showed evidence against the optimistic scenario

Recommended Reading

  1. 80K Hours: Nick Joseph on whether Anthropic's AI safety policy is up to the task (podcast, Oct 2025) — The most candid insider view of how safety decisions are actually made. Frank about tensions between safety and commercial objectives. Nick is likable and genuine, which makes his admissions about RSP limitations more striking. 80000hours.org

  2. "Unless Its Governance Changes, Anthropic Is Untrustworthy" (critique, Dec 2025) — The strongest substantive counterargument, documenting specific broken promises, deceptive lobbying, and governance failures with named sources. Essential for forming a balanced view. anthropic.ml

  3. Anthropic: Core Views on AI Safety (official position, Dec 2023) — The foundational document against which all subsequent actions should be measured. Contains the commitments Anthropic made about pessimistic scenarios that the "untrustworthy" critique argues have been abandoned. anthropic.com

  4. TIME: Anthropic Drops Flagship Safety Pledge (journalism, Feb 2026) — The pivotal RSP v3 story, with key Kaplan quotes explaining why the pause commitment was dropped. time.com

  5. GovAI: Anthropic's RSP v3.0 (analysis, Mar 2026) — Balanced independent analysis from governance researchers. Identifies strengths and weaknesses of the new framework. governance.ai

Show Claude’s analysis
An opinionated read. Read the brief first to form your own view.

Stated Theory of Change

Anthropic's theory of change is a three-part argument:

  1. You need frontier models to do safety research. Large models are qualitatively different from small ones. Safety techniques like Constitutional AI and debate only work on large models. Safety problems that matter (deception, power-seeking, strategic behavior) may only emerge at scale. Therefore, a serious safety research program must build frontier models.

  2. You need commercial success to sustain frontier research. Training frontier models costs billions. The only sustainable funding model is commercial revenue. Therefore, Anthropic must compete commercially — and winning commercially proves that safety is compatible with success, incentivizing others to invest in safety.

  3. You create a race to the top. By publishing safety frameworks (RSP), demonstrating safety techniques (Constitutional AI, interpretability), and making safety commercially viable, you pressure competitors to match your standards. If the safety-focused lab is also the commercially successful lab, you've shown the industry that safety pays.

The chain: Build frontier models -> Do safety research on them -> Publish findings -> Demonstrate commercial viability of safety -> Competitors adopt similar standards -> Industry-wide safety improves -> Regulation builds on voluntary standards.

Revealed Theory of Change

Anthropic's actions suggest a theory of change that differs from the stated one in important ways:

What they actually optimize for: Building the most capable AI models and capturing maximum market share, while maintaining enough safety credibility to retain talent and brand differentiation. The evidence:

  • Broke the "second-best" promise. Claude 3 Opus (Mar 2024) pushed the capability frontier. Subsequent releases routinely do.
  • Systematically weakened safety commitments (RSP v1 -> v2 -> v2.2 -> v3) as models got more capable and commercial stakes grew.
  • Committed $130B+ in infrastructure while maintaining the theoretical possibility of a pause.
  • Lobbied against binding safety regulation while publicly calling for regulation.
  • Never published a net risk analysis of whether Anthropic's existence reduces or increases total AI risk.

The revealed theory is closer to: "We are going to build the most powerful AI because someone will, and we are marginally better than the alternatives. Safety research is real but secondary to competitive position. If things get bad enough, we'll sound an alarm — but the definition of 'bad enough' will keep shifting as commercial stakes grow."

Where stated and revealed diverge most:

  • The stated theory says Anthropic would pause if necessary. The revealed behavior shows every structural incentive against pausing and the explicit removal of the pause commitment.
  • The stated theory says Anthropic acts under the assumption of a possibly pessimistic scenario. After the alignment faking paper provided evidence against the optimistic scenario, Anthropic did not publicly update its risk assessment or strategy.
  • The stated theory says Anthropic supports binding regulation. The revealed lobbying record shows opposition to any regulation that would actually constrain Anthropic's behavior.

Key Assumptions

Assumption 1: Safety research requires frontier models.

  • Evidence for: Alignment faking only emerged in large models. Interpretability techniques scale differently at different model sizes. Constitutional AI requires models capable of understanding nuanced principles.
  • Evidence against: Much safety research can be done on smaller models (MIRI, ARC have argued this). Building frontier models contributes to the very race that creates danger. The "need frontier models" argument is unfalsifiable — it can always justify the next bigger model.
  • Testable? Partially. If Anthropic's safety research produces techniques that measurably improve safety of others' models, the argument strengthens. If the techniques only work on Anthropic's own models, it weakens.
  • If wrong: Anthropic is a net accelerator of AI risk, using safety branding to attract talent and capital that would otherwise flow to pure safety orgs.

Assumption 2: Commercial success enables safety research.

  • Evidence for: Anthropic now has resources (4,585 employees, billions in revenue) that no pure safety org can match. Interpretability research at this scale is unprecedented.
  • Evidence against: Commercial success creates incentives that warp safety priorities. Every dollar of revenue creates pressure not to delay the next model. SB-1047 lobbying was influenced by Amazon's commercial interests, not safety analysis.
  • If wrong: Commercial success corrupts the safety mission over time, and Anthropic becomes functionally indistinguishable from other frontier labs (Karen Hao's thesis).

Assumption 3: The "race to the top" works.

  • Evidence for: OpenAI and DeepMind adopted RSP-like frameworks after Anthropic published theirs. Some regulatory frameworks (SB 53, EU Code of Practice) draw on voluntary standards.
  • Evidence against: Competitors' frameworks are no more binding than Anthropic's. The frameworks weaken together (GovAI concern). xAI has no meaningful safety framework. The "race to the top" hasn't prevented any lab from releasing any model. At best, it's added some safety testing to the release process — important but insufficient.
  • If wrong: Safety frameworks become industry-wide greenwash, providing the appearance of responsibility without the substance. Regulation built on voluntary standards inherits their weakness.

Assumption 4: Anthropic's governance is robust enough to hold under pressure.

  • Evidence for: Internal culture appears genuinely safety-oriented. Pentagon lawsuit shows willingness to sacrifice revenue. Multiple teams dedicated to safety.
  • Evidence against: Every governance mechanism has been weakened or shown to be weaker than presented. LTBT cannot fire CEO. PBC obligations are permissions, not mandates. RSP is self-enforced and self-modifiable. Board trend is toward political access, not safety expertise. Secret NDAs silenced departing safety researchers. Investor composition has shifted from EA-aligned to pure commercial.
  • If wrong: When the critical moment comes — when a model is genuinely dangerous but commercially vital — governance will not hold. The OpenAI precedent (board fired CEO, lost) is instructive.

Strengths

  1. Interpretability research is world-leading and genuinely valuable. Chris Olah's team and the "Scaling Monosemanticity" work represent the most serious attempt to understand what's happening inside large models. This is the kind of research that probably wouldn't happen at the same scale anywhere else.

  2. Safety culture is real, not performative. Internal culture of pushback, 8%+ on security, multiple alignment teams, 40% of CEO time on culture. Nick Joseph's podcast provides credible evidence that individual employees care deeply and would resist unsafe decisions.

  3. Published safety research advances the field. Alignment faking, Constitutional AI, constitutional classifiers, sabotage evaluations, interpretability findings — these are genuine contributions that benefit the broader safety ecosystem.

  4. Pentagon lawsuit demonstrates a real red line. When the government pressured Anthropic to remove safeguards for surveillance and weapons targeting, Anthropic sued. This is the single strongest evidence that safety principles are more than marketing.

  5. Constitutional AI and the "soul document" are novel approaches to alignment. Treating alignment as character development through virtue ethics rather than rule-following is an interesting and potentially valuable framework.

  6. Transparency is above average for frontier labs. System cards, RSP publication, safety case sketches, risk reports — more detailed public disclosure than any competitor.

Weaknesses and Risks

  1. Every structural incentive pushes toward acceleration. $19B ARR growing at 10x. $130B+ in infrastructure commitments. Commercial investor base. Impending IPO. Employee equity. No philanthropic floor. Dario says 12 months of delay means bankruptcy. The only thing preventing full acceleration is individual character — and character erodes under sustained pressure.

  2. The RSP trajectory is the strongest evidence against Anthropic's theory of change. The pattern — weaken commitments as commercial stakes grow, with plausible justification each time — is exactly what you'd predict from a company where incentives dominate stated values. The original RSP was the most binding safety commitment any lab had made. RSP v3 is meaningfully weaker. The trend is unambiguous.

  3. Lobbying contradicts stated positions. Publicly calling for regulation while privately opposing binding requirements is not honest. The SB-1047 episode — lobbying against whistleblower protections, pre-harm enforcement, and independent oversight while publicly appearing to support the bill — is documented deception, not ambiguity.

  4. Governance is weaker than presented. The LTBT cannot fire the CEO. Board members appointed by the LTBT still owe fiduciary duties to shareholders. The PBC is a permission to balance profit with mission, not a mandate. The RSP can be changed by the CEO with board approval. The Investors' Rights Agreement is not public. The critical question — "Would governance hold when it matters most?" — has no reassuring answer.

  5. The Amodei-Karnofsky-EA nexus creates significant conflicts. Daniela Amodei (President) is married to Holden Karnofsky (led RSP v3 development). Karnofsky co-founded Open Philanthropy, the dominant AI safety funder. LTBT trustees have deep EA/Open Phil ties. The community that evaluates Anthropic's safety claims has personal and financial entanglements with the company. This is not independent oversight.

  6. The "pessimistic scenario" commitment is unfalsifiable. No criteria for determining when Anthropic is in a pessimistic scenario. No described action protocol. After alignment faking research provided evidence against the optimistic scenario, no public strategy update. The commitment functions as reassurance, not as a binding plan.

  7. 4.4x headcount growth in 17 months dilutes founding culture. Most of the 4,585 employees did not join for safety reasons. As the commercial operation scales, the founding culture becomes a smaller fraction of the whole.

Cross-References

Compared to OpenAI: Same dynamics, slightly better execution. OpenAI's governance collapsed when the board tried to fire Altman. Anthropic's governance is explicitly designed to prevent that scenario — which means it's designed to prevent the strongest safety action a board can take. Anthropic's safety research is stronger than OpenAI's; its lobbying behavior is similar; its transparency is better; its financial incentive structure is identical.

Relationship to AI safety field: Anthropic is the largest employer of AI safety researchers. Its research advances the field. Its commercial success validates the "safety can be profitable" thesis. But its lobbying undermines regulatory efforts, and its weakening commitments undermine the credibility of voluntary frameworks industry-wide. GovAI: "Other companies may lower their commitments in response to the update."

Relationship to policy: Anthropic occupies an awkward position — too safety-oriented for the current administration (Pentagon supply chain risk designation), but not safety-oriented enough for the safety community (RSP v3 criticism). Jack Clark's policy team engages extensively but the lobbying record contradicts the public positioning.

What Would Change This Assessment

Positive updates (would make me more confident in Anthropic's theory of change):

  • Anthropic publishes quantified safety spend as proportion of total investment.
  • An external, independent audit of RSP compliance finds genuine adherence.
  • Anthropic actively lobbies FOR binding safety regulation that would constrain its own behavior.
  • A model is withheld or substantially delayed for safety reasons post-commercialization.
  • The Investors' Rights Agreement is published and shows LTBT power is robust.
  • Anthropic publishes a net risk analysis and concludes it's net positive with honest uncertainty.

Negative updates (would make me less confident):

  • RSP v4 further weakens commitments, especially in response to competitive pressure.
  • Safety team departures accelerate.
  • Post-IPO, safety investments are cut to meet quarterly targets.
  • Interpretability research stalls or is redirected toward commercial applications.
  • Another documented case of lobbying that contradicts public safety positioning.

Self-Critique

What's weakest in this analysis:

  • I may be over-weighting the "untrustworthy" critique relative to the defense case. The critique is detailed and specific, which makes it compelling, but it's written by someone with a clear thesis to advance. The defense — that Anthropic employees genuinely care about safety, that the research is real, that the Pentagon lawsuit shows genuine red lines — is harder to document but may be more important.
  • I don't have good data on what Anthropic's counterfactual impact is. If Anthropic didn't exist, would safety research happen elsewhere? Would the "race to the top" effect disappear? Would OpenAI be more or less safe? These counterfactual questions are crucial but unanswerable with current evidence.
  • My analysis of financial incentives applies equally to all frontier labs. The question of whether Anthropic is BETTER than the counterfactual lab that would exist in its place is the important question, and it's one I can't fully answer.

What a thoughtful person who disagrees would say: "You're holding Anthropic to an impossible standard. No company can commit to pause when competitors won't. No company can lobby for regulations that would put it at a competitive disadvantage. The RSP changes are honest acknowledgments of real constraints. Anthropic's safety research is more valuable than any commitment it could make, because commitments are just words but research is actual progress on the technical problem. The real measure is: is the world safer because Anthropic exists? And the answer is probably yes."

What information would most change my view: Quantified safety spend as a proportion of total investment, combined with an independent assessment of whether that research has actually made AI safer. If Anthropic spends 20%+ on genuine safety research that produces deployable safety techniques, the theory of change is much stronger. If it's 5% and declining, it's much weaker.

Connected to (13)

Sources (98)
Every URL that was read during research.
  1. 1.Anthropic - Wikipediaen.wikipedia.org
  2. 2.Company \ Anthropicanthropic.com
  3. 3.The Long-Term Benefit Trustanthropic.com
  4. 4.Anthropic's Responsible Scaling Policyanthropic.com
  5. 5.Dario Amodei — Machines of Loving Gracedarioamodei.com
  6. 6.Constitutional AI: Harmlessness from AI Feedbackanthropic.com
  7. 7.Anthropic’s Responsible Scaling Policy: Version 3.0anthropic.com
  8. 8.How Anthropic Designed Itself to Avoid OpenAI’s Mistakestime.com
  9. 9.Anthropic Long-Term Benefit Trustcorpgov.law.harvard.edu
  10. 10.Anthropic ditches its core safety promise in the middle of an AI red line fight with the Pentagon | CNN Businesscnn.com
  11. 11.What the Anthropic AI safety saga is really all about | CNN Businesscnn.com
  12. 12.Elizabeth Warren calls Pentagon's decision to bar Anthropic 'retaliation' | TechCrunchtechcrunch.com
  13. 13.Judge presses DOD on why Anthropic was blacklisted: 'That seems a pretty low bar'cnbc.com
  14. 14.Transcript for Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity | Lex Fridman Podcast #452 - Lex Fridmanlexfridman.com
  15. 15.Dario Amodei — "We are near the end of the exponential"dwarkesh.com
  16. 16.Dario Amodei (Anthropic CEO) - Scaling, Alignment, & AI Progressdwarkesh.com
  17. 17.Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnettransformer-circuits.pub
  18. 18.Core Views on AI Safety: When, Why, What, and Howanthropic.com
  19. 19.Anthropic Drops Flagship Safety Pledgetime.com
  20. 20.Anthropicailabwatch.org
  21. 21.Anthropic Researcher Quits in Cryptic Public Letterfuturism.com
  22. 22.OpenAI former safety leader Jan Leike joins rival AI startup Anthropiccnbc.com
  23. 23.2025 AI Safety Index - Future of Life Institutefutureoflife.org
  24. 24.Anthropic revenue, valuation & fundingsacra.com
  25. 25.Chris Liddell appointed to Anthropic’s board of directorsanthropic.com
  26. 26.Anthropic taps ex-Microsoft CFO, Trump aide Liddell for boardcnbc.com
  27. 27.Jay Kreps appointed to Anthropic's Board of Directorsanthropic.com
  28. 28.Anthropic Just Hit $14 Billion in ARR. Up From $1 Billion Just 14 Months Ago.saastr.com
  29. 29.Nick Joseph on whether Anthropic's AI safety policy is up to the task | 80,000 Hours80000hours.org
  30. 30.Dario Amodei — The Adolescence of Technologydarioamodei.com
  31. 31.On The Adolescence of Technologythezvi.substack.com
  32. 32.On Dwarkesh Patel's 2026 Podcast With Dario Amodeithezvi.substack.com
  33. 33.Anthropic’s RSP v3.0: How it Works, What’s Changed, and Some Reflections | GovAIgovernance.ai
  34. 34.Anthropic's Responsible Scaling Policyanthropic.com
  35. 35.Anthropic sues the Trump administration after it was designated a supply chain risk | CNN Businesscnn.com
  36. 36.Judge says government's Anthropic ban looks like punishmentnpr.org
  37. 37.Report: Anthropic Business Breakdown & Founding Story | Contrary Researchresearch.contrary.com
  38. 38.Daniela and Dario Amodei on Anthropic - Future of Life Institutefutureoflife.org
  39. 39.Pentagon’s Anthropic Designation Won’t Survive First Contact with Legal Systemlawfaremedia.org
  40. 40.Anthropic cofounder says studying the humanities will be 'more important than ever' and reveals what the AI company looks for when hiring | Fortunefortune.com
  41. 41.$21 billion+ new pledge: Anthropic’s seven cofounders, Dario and Daniela Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Christopher Olah commit 80% of their fortunes to combat AI-driven inequalitylifestylesmagazine.com
  42. 42.View: White House feud with Anthropic reveals broader AI safety concernssemafor.com
  43. 43.Exclusive: Pentagon’s Anthropic feud deepened after tense exchange over missile attackssemafor.com
  44. 44.A Timeline of the Anthropic-Pentagon Disputetechpolicy.press
  45. 45.Exclusive: Anthropic is Quietly Backpedalling on its Safety Commitmentsobsolete.pub
  46. 46.Recommendations for Technical AI Safety Research Directionsalignment.anthropic.com
  47. 47.Researchanthropic.com
  48. 48.Unknownanthropic.com
  49. 49.Announcing our updated Responsible Scaling Policyanthropic.com
  50. 50.Anthropic's Responsible Scaling Policyanthropic.com
  51. 51.Does the Anthropic–Pentagon feud mean the end of responsible AI? | The TechTank Podcast | Brookingsbrookings.edu
  52. 52.Anthropic Drops Its Huge Safety Pledge That Was Supposedly the Whole Point of the Companyfuturism.com
  53. 53.Chris Olah on what the hell is going on inside neural networks | 80,000 Hours80000hours.org
  54. 54.Alignment faking in large language modelsanthropic.com
  55. 55.Anthropic raises $30 billion in Series G funding at $380 billion post-money valuationanthropic.com
  56. 56.Anthropic raises another $30B in Series G, with a new value of $380B | TechCrunchtechcrunch.com
  57. 57.Anthropic - Wikipediaen.wikipedia.org
  58. 58.The Anthropic Long-Term Benefit Trustthestakehold.com
  59. 59.Google and Anthropic announce cloud deal worth tens of billions of dollarscnbc.com
  60. 60.Activating AI Safety Level 3 protectionsanthropic.com
  61. 61.Exclusive: New Claude Model Prompts Safeguards at Anthropictime.com
  62. 62.Constitutional Classifiers: Defending against universal jailbreaksanthropic.com
  63. 63.AI researchers are sounding the alarm on their way out the door | CNN Businesscnn.com
  64. 64.Anthropic invests $50 billion in American AI infrastructureanthropic.com
  65. 65.The Growth Miracle and the Six Fractures: Anthropic at $380 Billionshanakaanslemperera.substack.com
  66. 66.From shortcuts to sabotage: natural emergent misalignment from reward hackinganthropic.com
  67. 67.Anthropic's Pilot Sabotage Risk Reportalignment.anthropic.com
  68. 68.Findings from a Pilot Anthropic—OpenAI Alignment Evaluation Exercisealignment.anthropic.com
  69. 69.Unless Its Governance Changes, Anthropic Is Untrustworthyanthropic.ml
  70. 70.Scaling Laws: Rapid Response to the Implications of Claude's New Constitutionlawfaremedia.org
  71. 71.Alignment Science Blogalignment.anthropic.com
  72. 72.Anthropic CEO disputes Trump AI czar David Sacks' claims that company is 'woke'cnbc.com
  73. 73.Introducing the Anthropic National Security and Public Sector Advisory Councilanthropic.com
  74. 74.Anthropic employees say they’ll give away billions. Where will it go?transformernews.ai
  75. 75.Jared Kaplan - Wikipediaen.wikipedia.org
  76. 76.TIME100 AI 2025: Jared Kaplantime.com
  77. 77.Claude’s Constitutionanthropic.com
  78. 78.Anthropic is endorsing SB 53anthropic.com
  79. 79.Claude's Constitutional Structurethezvi.substack.com
  80. 80.Microsoft, NVIDIA and Anthropic announce strategic partnerships - The Official Microsoft Blogblogs.microsoft.com
  81. 81.The Dirty Little Secret of Benefit Corporationsgodspeed.ghost.io
  82. 82.5 Little‑Known Facts About Anthropic’s AI Governance Approachaigovernancelead.substack.com
  83. 83.Reed Hastings appointed to Anthropic’s board of directorsanthropic.com
  84. 84.Introducing Anthropic's Safeguards Research Teamalignment.anthropic.com
  85. 85.Three Sketches of ASL-4 Safety Case Componentsalignment.anthropic.com
  86. 86.Anthropic's Frontier Safety Roadmapanthropic.com
  87. 87.Claude's new constitutionanthropic.com
  88. 88.Anthropic hired president Daniela Amodei's husband to work on the company's AI safety strategy | Fortunefortune.com
  89. 89.Anthropic eases AI safety restrictions to avoid slowing developmentsemafor.com
  90. 90.Covering electricity price increases from our data centersanthropic.com
  91. 91.Daniela Amodei - Wikipediaen.wikipedia.org
  92. 92.Unknownwww-cdn.anthropic.com
  93. 93.Amanda Askell - Wikipediaen.wikipedia.org
  94. 94.How Anthropic Builds Claude’s Personalitybigtechnology.com
  95. 95.Holden Karnofsky - Wikipediaen.wikipedia.org
  96. 96.Pentagon's Biggest Champion of Blacklisting Anthropic Has a Few Million Reasons for His Stancegizmodo.com
  97. 97.Anthropic’s Dario Amodei says he spends up to 40% of his time on company culture, not products, because it’s the only thing that will win the AI race | Fortunefortune.com
  98. 98.Defense tech companies are dropping Claude after Pentagon's Anthropic blacklistcnbc.com