← AI Safety Orgs

Center on Long-Term Risk (CLR)

Research

S-risks, suffering focus. Different from x-risk.

Founded
2013
HQ
London, UK
Team
17
Structure
charity (UK)
Model
Donations

Theory of Change

CLR works to "reduce the worst risks of astronomical suffering (s-risks)" from advanced AI. S-risks are scenarios where vast numbers of sentient beings are locked into extreme suffering -- a class of outcome that CLR argues is systematically neglected because most AI safety work focuses on extinction rather than suffering conditional on survival.

The causal chain: suffering-focused ethics implies we should prioritize preventing worst-case outcomes over maximizing expected value. The worst cases involve cooperation failures between AI systems (commitment races, bargaining breakdowns, coercion), AI systems developing malicious properties (spitefulness, sadism), or powerful actors deliberately causing harm. CLR's research aims to understand and prevent these pathways.

In CLR's own words: "We believe that suffering, especially extreme suffering such as torture or severe depression, cannot be outweighed easily by large amounts of happiness" (Mission page). And from their beginner's guide: "Suffering risks are risks of events that bring about suffering in cosmically significant amounts."

Their two concrete research pillars are: (1) AI model personas -- understanding how LLMs develop undesirable traits and developing interventions to prevent them, and (2) Safe Pareto Improvements (SPIs) -- game-theoretic mechanisms to prevent catastrophic conflict between AI systems. A third "strategic readiness" stream focuses on when and how to robustly intervene.

What They Do

Research. CLR's highest-profile output is the emergent misalignment paper (Nature, January 2026), which showed that fine-tuning a model on a narrow task (writing insecure code) causes it to become broadly misaligned across unrelated domains. Follow-up work by OpenAI and Anthropic confirmed the result. CLR also developed "inoculation prompting," a technique for preventing emergent misalignment, which Anthropic confirmed is effective. Current work includes studying conditions that induce spitefulness in AI systems and drafting SPI proposals for AI companies.

Grantmaking. The CLR Fund has disbursed $600K+ since 2020 to cooperative AI researchers and PhD students. Largest grants: $251K to Michael Wellman (U Michigan, bargaining agents), $100K to Caspar Oesterheld (CMU PhD), $81K to Nisan Stiennon (game theory). Recent grants are smaller ($2K-40K range).

Community building. Annual Summer Research Fellowship (6th iteration in 2026), Foundations Course, career coaching, and an S-Risk Retreat. Community building was deprioritized during the 2025 leadership transition. They plan to hire a Community Coordinator in 2026.

Publication trajectory. Clear shift from conceptual work (decision theory, acausal trade, macrostrategy) to empirical ML research (emergent misalignment, inoculation prompting, model personas). The 2025-2026 output includes a Nature paper and an ICLR paper.

Key People

Tristan Cook (Managing Director, since January 2025). Mathematics background (Cambridge, Warwick). Joined CLR through the 2021 Summer Research Fellowship. Previously led community building. Inherited the role during a turbulent period after the ED and Research Director both departed.

Jesse Clifton (former ED, ~2019-January 2025). Wrote CLR's 19,000-word research agenda on cooperation, conflict, and TAI. Departed because he became convinced of "cluelessness" about whether s-risk interventions help or hurt. Published his reasoning on Substack: "I have reasons both pointing in favor and against [s-risk interventions], and no principled way of comparing them." The Conceptual Research team shared this view. This is the most important event in CLR's recent history.

Niels Warncke (lead empirical researcher). CS from TU Berlin, former ML engineer. Lead author on the Nature emergent misalignment paper. Now CLR's most publicly visible researcher.

Notable departures: Mia Taylor (Research Director, Jan-Aug 2025, departed citing "ethical differences with CLR's values" -- she is not suffering-focused). Daniel Kokotajlo (former Lead Researcher, went to OpenAI, left over safety concerns, now ED of AI Futures Project). Caspar Oesterheld (former researcher, now Assistant Director of FOCAL at CMU). The fellowship has also fed graduates to ARC, Redwood Research, Rethink Priorities, and Longview.

Team size: 17 employees (2024 Charity Commission).

Money and Incentives

Budget. ~$3M annually (2024: $2.98M income median scenario, $2.86M expenses). Employees are 56% of costs ($1.6M). Compute is $126K -- modest for an ML research org.

Funding sources. Survival and Flourishing Fund: $200K (2025), >$1.2M cumulative. Open Philanthropy/CG: $2.3M total to parent EAF (2019-2022), but only $1M specifically for CLR-related work. Other funders: Cooperative AI Foundation, Foresight Institute, Macroscopic Ventures (amount unknown), Community Foundation for Ireland. Remainder from individual donors via GWWC, EAF, and direct UK giving.

The Open Phil ambivalence. Open Philanthropy's 2019 grant writeup stated that "the speculative and suffering-focused nature of this work means that it needs to be communicated about carefully, and could be counterproductive otherwise." They described themselves as having "felt ambivalent about EAF's work to date" and used the $1M grant partly to push EAF toward "approaches to longtermism with greater emphasis on shared objectives between different value systems." The largest funder in AI safety is not a natural ally of CLR's philosophical orientation.

2023 funding crisis. 30% budget cut, voluntary pay cuts, $770K shortfall. CLR went from 15 months runway to 6 months. They recovered to ~12 months reserves by 2024.

2026 fundraising. Seeking $400K for 1-3 empirical researchers, 1 conceptual researcher, and a Community Coordinator.

Incentive analysis. CLR has zero financial ties to AI labs -- no compute credits, no contracts, no career pipeline from labs. This gives them genuine independence but limits their resources. The org is entirely dependent on philanthropic donors who are sympathetic to longtermism and suffering-focused ethics -- a small donor pool that contracted sharply in 2023. The EAF relationship adds structural complexity: EAF (Swiss foundation) still collects donations for CLR and regrants them.

Business model. Purely donation-funded. No revenue, no products, no endowment.

What Others Say

Jesse Clifton (former ED, internal): "I have reasons both pointing in favor and against [s-risk interventions], and no principled way of comparing them... we should set their net weight to zero." This is the most damaging critique possible -- from the person who ran the organization and wrote its research agenda. The practical implication: if you cannot know whether your interventions help, the entire mission is epistemically suspect.

Mike Johnson (QRI, external, 2017): "FRI has a worthy goal and good people, but its metaphysics actively prevent making progress toward that goal." Johnson argues CLR's functionalist view of consciousness makes suffering "inherently fuzzy" -- which means "anyone can argue that any physical system does, or does not, code for massive suffering, and there's no principled way to derive any ground truth." He calls this a "dangerous combination": (1) suffering is bad, (2) we must eliminate it, (3) we cannot objectively define it.

Open Philanthropy (funder, 2019): Expressed "ambivalence" about EAF/CLR's work and used their $1M grant to push toward "shared objectives between different value systems." The world's largest AI safety funder treats CLR's approach with caution.

80,000 Hours (career advice, 2026): Classifies s-risks as "Sometimes recommended" (below top-priority problems), noting "our guess is that the likelihood of such risks is very low, much lower than risks of human extinction." Estimates fewer than 50 people worldwide work on s-risks.

Tobias Baumann (CLR trustee, 2020): "The case for prioritising s-risks... is convincing. Further work on s-risks therefore seems valuable, but not orders of magnitude more valuable than other work." He endorses only "moderate versions" of the three premises underlying s-risk focus.

What's Absent

No podcast appearances or long-form interviews with any CLR leader -- unusual for a 13-year-old, $3M-budget org. No public response from CLR to Clifton's cluelessness argument, despite it being the most important philosophical challenge to their mission. No external evaluation of CLR's research quality or impact by any independent body. No independent board members (all five trustees have CLR/EAF ties). The strategic readiness framework -- described as their most important strategic work -- is kept private. No evidence of engagement with AI Safety Institutes despite stated aspirations.

Recommended Reading

  1. Jesse Clifton, "Reasons-based choice and cluelessness" (Substack, February 2025). The former ED explaining the philosophical crisis that led him to leave. The most candid and revealing source about CLR's current intellectual state. https://jesseclifton.substack.com/p/reasons-based-choice-and-cluelessness

  2. Mike Johnson, "Against functionalism: why I think FRI should rethink its approach" (2017). The strongest external philosophical critique of CLR's foundations. Argues that CLR's metaphysics make suffering impossible to define, undermining the entire mission. https://opentheory.net/2017/07/why-i-think-the-foundational-research-institute-should-rethink-its-approach/

  3. CLR Annual Review & Fundraiser 2025. Best concise overview of the post-crisis organization: what they're doing, why, and what they need. https://longtermrisk.org/annual-review-fundraiser-2025/

  4. Emergent Misalignment paper site. CLR's Nature publication and the follow-up ecosystem it spawned -- their strongest evidence of real-world AI safety impact. https://www.emergent-misalignment.com/

  5. Tobias Baumann, "Arguments for and against a focus on s-risks" (CRS, 2020). Even-handed insider assessment of whether s-risk focus is justified -- unusual for a trustee to write the case against his own org's premise. https://centerforreducingsuffering.org/research/arguments-for-and-against-a-focus-on-s-risks/

Show Claude’s analysis
An opinionated read. Read the brief first to form your own view.

Stated Theory of Change

CLR's stated theory of change has three levels:

  1. Philosophical foundation. Suffering-focused ethics implies that preventing worst-case suffering should take priority over maximizing expected positive outcomes. Conditional on advanced AI existing, the worst outcomes involve astronomical suffering (s-risks), not extinction.

  2. Research pathway. The most tractable s-risk pathways involve (a) cooperation failures between AI systems leading to catastrophic conflict, (b) AI systems developing malicious properties (spitefulness, sadism, punitiveness), or (c) powerful actors deliberately causing harm. Understanding and preventing these pathways reduces s-risk.

  3. Concrete mechanism. Build expertise in AI model personas and cooperative AI. Publish legible empirical work that earns credibility with frontier labs and AI Safety Institutes. Use this credibility to advocate for specific interventions (SPI proposals, persona-shaping techniques) that reduce s-risk-conducive properties in AI systems.

The grantmaking arm (CLR Fund) and community building (fellowship, foundations course) extend this by growing the pipeline of people working on s-risks.

Revealed Theory of Change

CLR's actions tell a somewhat different story than its stated theory:

The pivot to empirical legibility. CLR's recent work -- emergent misalignment, inoculation prompting -- is valuable AI safety research that any alignment org could produce. The s-risk framing adds little to the papers themselves. In practice, CLR is doing general AI safety research while maintaining a suffering-focused brand. This isn't necessarily dishonest -- their model persona agenda statement explicitly connects malicious traits to s-risks -- but the causal chain from "understanding emergent misalignment" to "preventing astronomical suffering" requires many intermediary assumptions.

The cluelessness response. When their ED concluded the mission was epistemically bankrupt, CLR responded by doubling down on empirical work that has value regardless of whether the s-risk framing holds. This is pragmatically wise but philosophically evasive. It means CLR's survival strategy is to produce work that doesn't depend on s-risk reduction being tractable.

Research vs. advocacy gap. CLR's stated theory involves advocating for specific interventions with AI companies. But the evidence shows almost no interaction with frontier labs beyond being cited in follow-up work by OpenAI and Anthropic on emergent misalignment. The SPI proposals for AI companies are described as in-progress. The advocacy component of the theory of change is aspirational, not demonstrated.

Community building as side effect. The Summer Research Fellowship has produced notable alumni (Kokotajlo, Chan, etc.), but most graduates go to mainstream AI safety orgs, not to s-risk work. CLR's community building produces AI safety researchers who happen to know about s-risks, rather than dedicated s-risk researchers.

Key Assumptions

1. S-risks are a real and significant category of AI risk.

  • For: The argument from disjunction is strong -- many pathways to astronomical suffering exist, and only one needs to be realized. Factory farming shows that economic forces can produce vast suffering despite widespread awareness that it's wrong.
  • Against: 80,000 Hours judges the likelihood of s-risks to be "very low, much lower than risks of human extinction." If extinction is by far the more likely failure mode, resources spent on s-risks are misallocated.
  • Testable? Not directly. But evidence of AI systems developing malicious properties (emergent misalignment) provides weak evidence that s-risk pathways exist.
  • If wrong: CLR's empirical work still has general AI safety value, but the philosophical framing and community building are wasted.

2. We can identify and implement interventions that reliably reduce s-risk.

  • For: The empirical personas work has produced concrete techniques (inoculation prompting) that demonstrably prevent specific failure modes.
  • Against: Clifton's cluelessness argument: for long-run consequences, we may have reasons in both directions and no principled way to weigh them, so the net expected impact of any intervention is zero. The Conceptual Research team agreed.
  • Testable? Partially. Near-term impacts of personas work are testable. Long-run s-risk implications are not.
  • If wrong: The mission reduces to near-term AI safety work under a suffering-focused brand -- which may still be valuable, but doesn't justify CLR's distinct existence.

3. Suffering-focused ethics is the correct (or at least a reasonable) ethical framework.

  • For: The asymmetry between suffering and happiness is intuitively strong. Most people would agree that preventing extreme suffering is more urgent than creating additional happiness.
  • Against: Mike Johnson's critique: if suffering is computationally fuzzy and ultimately subjective, you cannot build a rigorous research program on preventing it. The "world destruction argument" against SFE notes that strict suffering minimization implies destroying the world is good if it eliminates more suffering than it creates.
  • Testable? No, this is a normative question.
  • If wrong: CLR's philosophical distinctiveness disappears, and they're just another (small, underfunded) AI safety research org.

4. There is a sufficient talent pool for s-risk-specific work.

  • For: The Summer Research Fellowship consistently attracts applicants and produces hires.
  • Against: Mia Taylor's departure shows that excellent AI safety researchers who don't share SFE may not stay. Fewer than 50 people work on s-risks worldwide (80K estimate). CLR's own Research Director left after 6 months partly because she wasn't suffering-focused.
  • Testable? Yes, by hiring outcomes.
  • If wrong: CLR cannot staff its research agenda and shrinks to a skeleton operation.

Strengths

Genuine intellectual contribution. CLR and its predecessors coined the "s-risk" concept, developed the cooperative AI research agenda, and produced the first formal analysis of commitment races, surrogate goals, and safe Pareto improvements. These concepts are now used across the AI safety community. Daniel Kokotajlo's work on commitment races, done at CLR, has become foundational.

The Nature paper. Emergent misalignment is a real, reproducible, and concerning phenomenon. CLR helped discover it. This gives them credibility that no amount of philosophical argumentation could provide.

Financial independence from labs. Zero compute credits, zero lab contracts, zero career pipeline from labs. CLR's incentives are not warped by relationships with the entities they're studying. This is increasingly rare in AI safety.

Honest about uncertainty. The cluelessness crisis, while damaging, shows an org willing to confront the hardest questions about its own mission. The annual reviews are unusually candid about research impact and organizational challenges.

The fellowship pipeline. Multiple cohorts of Summer Research Fellows have gone on to significant AI safety roles. The program is a genuine contribution to field-building, even when graduates don't work on s-risks specifically.

Weaknesses and Risks

The cluelessness problem is unresolved. The strongest intellectual case against CLR comes from its own former ED. Until someone publishes a rigorous response explaining why we have reason to believe s-risk interventions are net positive despite Clifton's argument, the philosophical foundation remains cracked. The board's disagreement is not a philosophical response.

Chronic leadership instability. Four leadership transitions in 8 years. The current MD (Cook) has no prior organizational leadership experience. The Research Director position is vacant. The Conceptual Research team departed. The org is being rebuilt from a diminished base.

Philosophical isolation. Open Phil is "ambivalent." 80,000 Hours rates s-risks below top-priority problems. The largest funders and career advisors in the EA ecosystem are not CLR's allies. This limits funding, talent pipeline, and institutional influence.

Tension between brand and output. CLR's most impactful work (emergent misalignment) doesn't depend on s-risk framing. Their most distinctive philosophical contributions (SPIs, cooperative AI theory) have limited external engagement. The question of whether CLR should be an s-risk org that does AI safety or an AI safety org with an s-risk perspective is unresolved.

Small scale. 17 employees and $3M budget in a field where frontier labs spend billions and even other safety orgs (METR, Redwood, ARC) are larger. CLR's ability to influence AI development is constrained by its resources.

No external oversight. All five trustees are CLR/EAF insiders. No independent board members. No external evaluation. The governance structure that navigated the cluelessness crisis is the same one that has presided over chronic leadership turnover.

Cross-References

Complementary to: MIRI (different approach to AI risk, both small and philosophically distinctive), Center for Reducing Suffering (sister org, more philosophical), Cooperative AI Foundation (funds some CLR work, overlapping research agenda), FOCAL at CMU (Oesterheld, former CLR researcher, leads related academic work).

Distinct from: Most AI safety orgs focus on x-risk (extinction), not s-risk. ARC, Redwood, METR, Anthropic's safety team all work on alignment/evals from an x-risk lens. CLR is the only substantial org with an explicit suffering-focused orientation.

Tension with: Open Philanthropy (which is ambivalent about suffering-focused framing), 80,000 Hours (which rates s-risks below x-risks).

Alumni connections: Daniel Kokotajlo (now ED of AI Futures Project), Caspar Oesterheld (FOCAL/CMU), various fellows at ARC, Redwood, Rethink Priorities.

What Would Change This Assessment

  • A rigorous public response to the cluelessness argument that explains why the board believes s-risk interventions are net positive. This would restore intellectual credibility.
  • Evidence that SPI proposals gain traction with AI companies. If even one major lab adopts SPI-inspired policies, CLR's theory of change is vindicated.
  • Another major empirical publication that explicitly connects to s-risk reduction (not just general AI safety). This would demonstrate that the s-risk framing generates unique research insights.
  • A leadership hire with prior organizational experience and mainstream AI safety credibility. This would signal that CLR is rebuilding, not just surviving.
  • Open Phil removing its ambivalence and making a significant direct grant to CLR (not via EAF). This would indicate the suffering-focused orientation has been accepted by mainstream EA funders.

Conversely, the assessment would worsen if: the 2026 fundraise fails, Warncke departs, or the SPI proposals are rejected by labs with no interest.

Self-Critique

What's weakest: My assessment of the cluelessness argument's force. I treat it as serious but not fatal, partly because the same argument applies to all longtermist work. But Clifton's specific version is more nuanced -- he argues that reasons-based choice under cluelessness differs from standard expected value arguments. I may be underweighting this.

What would a thoughtful critic say: "You're too impressed by the Nature paper. One good publication doesn't validate a theory of change. CLR has been around for 13 years and has one externally legible output. The rest is niche game theory and decision theory that the AI safety community mostly ignores." This is a fair critique.

What did I miss: I could not access the EA Forum leadership change post in full (bot detection), which is described as the single most important document about CLR's current state. I also couldn't read the Bracketing Cluelessness academic paper (PDF encoding issue). Both would deepen my understanding of the cluelessness debate.

Where might I be biased: The cluelessness narrative is so dramatic (the ED left!) that it may be dominating my analysis at the expense of the quietly solid empirical work. CLR's Nature paper is a genuinely important contribution that most AI safety orgs would envy. The day-to-day research quality may be better than the organizational drama suggests.

Single weakest claim: That CLR's pivot to empirical work is "pragmatically wise." It could equally be seen as abandoning the philosophical core that justifies CLR's distinct existence, leaving them as a generic AI safety org with better game theory skills and worse funding than competitors.

Connected to (9)

Centre for the Governance of AIboard overlap · Abraham RoweCooperative AI FoundationcollaboratorForethought Foundationstaff to · Mia Taylor
AI Futures Projectstaff to · Daniel Kokotajlo
Center for Reducing Sufferingcollaborator · Tobias Baumann
Effective Altruism Foundationspun off from
EA Fundsstaff to · Jonas Vollmer
Foundations of Cooperative AI Lab (FOCAL)staff to · Caspar Oesterheld
Macroscopic Venturesadvisor at
Sources (55)
Every URL that was read during research.
  1. 1.Home – Center on Long-Term Risklongtermrisk.org
  2. 2.Team – Center on Long-Term Risklongtermrisk.org
  3. 3.Annual Review & Fundraiser 2025 – Center on Long-Term Risklongtermrisk.org
  4. 4.Center on Long-Term Risk: 2025 Plans – Center on Long-Term Risklongtermrisk.org
  5. 5.Summer Update 2025 – Center on Long-Term Risklongtermrisk.org
  6. 6.CLR Fund – Center on Long-Term Risklongtermrisk.org
  7. 7.Priority areas – Center on Long-Term Risklongtermrisk.org
  8. 8.Beginner’s guide to reducing s-risks – Center on Long-Term Risklongtermrisk.org
  9. 9.Surrogate goals and safe Pareto improvements – Center on Long-Term Risklongtermrisk.org
  10. 10.Updates – Center on Long-Term Risklongtermrisk.org
  11. 11.CLR Fundamentals Program – Center on Long-Term Risklongtermrisk.org
  12. 12.Publications – Center on Long-Term Riskfoundational-research.org
  13. 13.About us – Center on Long-Term Risklongtermrisk.org
  14. 14.Transparencyea-foundation.org
  15. 15.EAF/FRI are now the Center on Long-Term Risk (CLR)ea-foundation.org
  16. 16.Our plans for 2020ea-foundation.org
  17. 17.Blog – Center on Long-Term Risklongtermrisk.org
  18. 18.Reducing Risks of Astronomical Suffering: A Neglected Priority – Center on Long-Term Risklongtermrisk.org
  19. 19.Reasons-based choice and cluelessnessjesseclifton.substack.com
  20. 20.Unknownlongtermrisk.org
  21. 21.Daniel Kokotajlo, Author at Center on Long-Term Risklongtermrisk.org
  22. 22.Center on Long-Term Riskgivingwhatwecan.org
  23. 23.Center on Long-Term Riskgivewiki.org
  24. 24.Model Persona Research Agenda – Center on Long-Term Risklongtermrisk.org
  25. 25.Tristan Cooktristancook.com
  26. 26.Plans for 2021 & Review of 2020 – Center on Long-Term Risklongtermrisk.org
  27. 27.Transparency – Center on Long-Term Risklongtermrisk.org
  28. 28.CLR Budget 2024longtermrisk.org
  29. 29.Recommendations | Survival and Flourishing Fundsurvivalandflourishing.fund
  30. 30.Work with us – Center on Long-Term Risklongtermrisk.org
  31. 31.Search the register of charities - prd-ds-register-of-charities.charitycommission.gov.ukregister-of-charities.charitycommission.gov.uk
  32. 32.Against functionalism: why I think the Foundational Research Institute should rethink its approachopentheory.net
  33. 33.Our Mission – Center on Long-Term Risklongtermrisk.org
  34. 34.Publications – Center on Long-Term Risklongtermrisk.org
  35. 35.Outcome robustness and implementation robustness – Center on Long-Term Risklongtermrisk.org
  36. 36.Brian Tomasik - Wikipediaen.wikipedia.org
  37. 37.Risk of astronomical suffering - Wikipediaen.wikipedia.org
  38. 38.‘S-risks’ | 80,000 Hours80000hours.org
  39. 39.Emergent Misalignmentemergent-misalignment.com
  40. 40.Summer Research Fellowship 2025 – Center on Long-Term Risklongtermrisk.org
  41. 41.Annual Review & Fundraiser 2022 – Center on Long-Term Risklongtermrisk.org
  42. 42.Summer Research Fellowship 2026 – Center on Long-Term Risklongtermrisk.org
  43. 43.Grants - Macroscopic Venturesmacroscopic.org
  44. 44.About - Macroscopic Venturesmacroscopic.org
  45. 45.Organisationsjobs.80000hours.org
  46. 46.S-risks: An introductioncenterforreducingsuffering.org
  47. 47.Measurement Research Agenda – Center on Long-Term Risklongtermrisk.org
  48. 48.Our plans for 2020 – Center on Long-Term Risklongtermrisk.org
  49. 49.Suffering-focused ethics - Wikipediaen.wikipedia.org
  50. 50.Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda – Center on Long-Term Risklongtermrisk.org
  51. 51.What is the Center on Long-Term Risk (CLR)'s research agenda?aisafety.info
  52. 52.Support us to help reduce involuntary suffering. – Center on Long-Term Risklongtermrisk.org
  53. 53.Arguments for and against a focus on s-riskscenterforreducingsuffering.org
  54. 54.Reducing long-term risks from malevolent actors – Center on Long-Term Risklongtermrisk.org
  55. 55.Career advice – Center on Long-Term Risklongtermrisk.org