SaferAI

Governance

Compute licensing proposals.

Founded: 2023
HQ: Paris, France
Team: 15
Structure: association loi 1901 (France)
Model: Mixed

Theory of Change

SaferAI's theory of change: AI risk management is decades behind other high-risk industries; external pressure through independent ratings, quantitative risk modeling, and embedding in international standards bodies can close this gap.

Founder Simeon Campos, in a 2022 interview: "AI auditing is one of the most promising way to ensure that if at some point there are existing alignment tools, they will be implemented actually by top organization and top labs that are developing AGI." SaferAI was conceived as the enforcement arm for alignment work done by others -- the bet is that labs might develop safety tools but won't use them without external accountability.

The implementation has three prongs:

Ratings: Publicly score frontier AI companies' risk management practices, creating reputational pressure to improve
Standards: Embed SaferAI personnel in the bodies writing AI standards (CEN-CENELEC, ISO, OECD, EU Code of Practice, NIST AISIC) so that aviation/nuclear-style risk management principles get baked into regulatory infrastructure
Risk modeling: Develop quantitative methods to translate AI benchmark scores into real-world risk estimates, giving regulators concrete tools rather than abstract frameworks

What They Do

Ratings (flagship product): First released Oct 2024 with 6 companies, expanded to 12 in Jul 2025. Framework has 65 criteria across risk identification, risk tolerance/analysis, risk mitigation, and risk governance. Scores: Anthropic 35%, OpenAI 33%, Meta 22%, Google DeepMind 20%, xAI 18%. No company scored above "weak." Companies get two weeks' notice and right of reply before publication. SaferAI provides the Safety Frameworks domain analysis for the FLI AI Safety Index (all editions).

Standards engagement: James Gealy leads the high-risk AI risk management standard at CEN-CENELEC JTC 21 and edits the ISO red teaming standard. SaferAI is the only NGO member of the OECD G7 Hiroshima Process task force. Founding member of the US NIST AI Safety Institute Consortium. Henry Papadatos contributed to the EU Code of Practice for GPAI models.

EU CBRN contract (Feb 2026): Selected by EU AI Office for a 3-year contract to lead risk modeling and monitoring for CBRN threats from GPAI models. Consortium with FAR.AI (lead), SecureBio, GovAI, Nemesys Insights, Equistamp.

Quantitative risk research (Dec 2025): Three papers released simultaneously: risk modeling methodology, 9 detailed cyber risk models using MITRE ATT&CK framework, and LLM-simulated expert elicitation (using Claude 3.7 Sonnet with 5 expert personas to scale risk estimation). The LLM approach achieved R-squared of 0.46 for predicting task difficulty and estimates that fell between two human expert groups.

Advisory: Collaborated with METR to develop G42's Frontier AI Safety Framework (published Feb 2025). SaferAI also rates G42 in its public ratings.

Key People

Simeon Campos -- Founder, Board Chairman (previously ED). Co-founded EffiSciences (AI alignment field-building, France) before SaferAI. Technical expert at ISO/IEC and CEN-CENELEC. Foresight Institute fellow. Short timelines worldview. Strategic thinker who identified the auditing/standards niche early. Transitioned from ED to Chairman; Henry Papadatos now runs operations.

Henry Papadatos -- Executive Director. Background: LLM sycophancy research at CHAI/UC Berkeley, NeurIPS publication. OECD AI Risk & Accountability expert group. EU Code of Practice contributor. Writes op-eds in TIME. Bridges technical AI research and governance policy.

Malcolm Murray -- Research Lead. Certified Superforecaster (Good Judgment Project top 2%). 20+ years risk experience (ex-Gartner Chief of Research). CFA, MBA INSEAD. Chapter Lead for International AI Safety Report. Runs the Delphi studies that underpin SaferAI's quantitative risk methodology.

Team size: ~15, with unusual backgrounds for an AI safety org: spacecraft engineer, automotive safety expert, particle physicist, ex-Microsoft EU Government Affairs director, ex-OECD strategic foresight head. Deliberately recruited from industries with established risk management cultures.

Money and Incentives

Total budget: unknown. SaferAI does not publish financial statements. States it has "multiple institutional and private funders from the wider AI safety space" but "cannot disclose the full list publicly" without explanation.

Confirmed funding:

Founders Pledge: "primary funder" per TIME (amount undisclosed)
Jaan Tallinn: "primary funder" per TIME (amount undisclosed; also funds SaferAI through SFF)
SFF 2025: $311K recommended ($307K Freedom Track + $4K Fairness Track), plus $90K speculation grant and $111K matching pledge
Manifund: ~$125K from a single donor
EU AI Office: CBRN consortium contract (3-year, amount undisclosed but part of EUR 9M total tender)
Total confirmed public: ~$637K. Actual budget likely $1.5-3M given ~15 staff in Paris.

Business model: Grants/donations plus new government contract revenue. Actively developing commercial advisory revenue (Gabor Szorad, Product Advisor, is working to "increase financial independence and deliver benefits to companies interested in responsible AI"). Zero CG/Open Philanthropy grants.

Funder concentration: Jaan Tallinn appears to be the single largest funding source across multiple vehicles (direct, Founders Pledge, SFF). One-person dependency risk.

Key incentive concern -- advisory/rating dual role: SaferAI advised G42 on developing their safety framework, then rated G42 in its public ratings system. If the commercial advisory business grows, this conflict pattern could become structural. The same dynamic corrupted credit rating agencies in the financial sector. SaferAI has not publicly addressed this conflict.

Key governance concern -- Fabien Roger: SaferAI board member and technical advisor Fabien Roger is simultaneously Member of Technical Staff at Anthropic. SaferAI rates Anthropic. Mitigating evidence: SaferAI downgraded Anthropic's score and called their RSP changes "very bad process." But the structural conflict exists regardless of how it has played out so far.

What Others Say

Endorsements: Yoshua Bengio endorsed the ratings: "we can't let them grade their own homework." The FLI Safety Index uses SaferAI's methodology for its Safety Frameworks domain. TIME has covered SaferAI three times.

Company responses: Google DeepMind's response to its rating: "our comprehensive approach to AI safety extends well beyond what's captured." Anthropic did not respond to a request for comment when its score declined.

Structural criticism (not about SaferAI specifically): Stuart Russell (FLI panel): "None of the current activity provides any kind of quantitative guarantee of safety; nor does it seem possible to provide such guarantees given the current approach to AI via giant black boxes." This challenges the premise that aviation-style risk frameworks can work for AI. The frontier AI auditing paper (Jan 2026) argues that meaningful assurance requires deep access to non-public information -- SaferAI's public-information-only approach maps to the lowest assurance level in that framework.

Absence of criticism: Despite extensive searching, zero direct criticism of SaferAI was found -- no forum posts, no critical articles, no public debate about methodology or independence. SaferAI is likely too small to attract dedicated critics, and the AI safety community may broadly support external rating efforts even if the specific methodology has limitations.

What's Absent

No published financial statements or annual report. For a nonprofit that rates companies on transparency, this is a notable gap.
No published conflict of interest policy. Two active conflicts (Fabien Roger/Anthropic, G42 advisory/rating) exist without documented management procedures.
No external validation of SaferAI's own ratings methodology. SaferAI rates companies on third-party verification but has not subjected its own methodology to independent review.
No impact assessment. After two rounds of ratings, SaferAI has not published data on whether companies improved practices in response. Two companies' scores declined in the second round.
No information about advisory clients beyond G42. SaferAI mentions advising "a major AI company" but doesn't name it. The full extent of potential rating conflicts is unknown.
No discussion of limitations on the public-facing ratings site. The academic papers acknowledge limitations; the ratings product does not.

Stated Theory of Change

SaferAI states a clear three-part mechanism for reducing AI risk:

Measure: Develop quantitative risk assessment tools that translate benchmark capabilities into real-world risk estimates, using methodologies adapted from aviation, nuclear, and financial risk management.
Rate: Publicly score frontier AI companies' risk management practices against these tools, creating reputational pressure and providing data for regulators, investors, and the public.
Standardize: Embed SaferAI's risk management philosophy into international standards (EU AI Act, CEN-CENELEC, ISO, OECD, NIST) so that regulatory infrastructure institutionalizes aviation-style safety requirements for AI.

The underlying assumption is that alignment tools will eventually exist, but labs won't implement them without external pressure. Simeon Campos framed it explicitly: SaferAI is the enforcement mechanism for alignment work done by others. The causal chain: independent measurement -> public accountability -> regulatory standards -> mandatory implementation -> reduced AI risk.

Revealed Theory of Change

SaferAI's actions mostly align with its stated theory, with some notable divergences:

Consistent with stated theory:

The ratings product exists and has been released twice with expanding coverage (6 companies, then 12)
Standards body participation is extensive and deep (writing standards, not just commenting)
The EU CBRN contract represents the theory working: SaferAI's risk modeling expertise is being directly used by regulators
The FLI Safety Index collaboration multiplies SaferAI's reach through a more established partner
The quantitative risk modeling papers provide the technical foundation for the entire project

Divergences from stated theory:

The stated emphasis on "independence" is complicated by the advisory business. Advising G42 on their framework and then rating it reveals a consulting-and-rating dual role that the stated theory doesn't acknowledge.
The commercial revenue development (Gabor Szorad) suggests SaferAI may be moving toward a model where companies PAY for advice and then get RATED by the same organization. This hasn't happened yet, but the trajectory points there.
The financial opacity contradicts the transparency principles SaferAI applies to others. If the theory of change depends on credibility as an independent assessor, and credibility depends on demonstrated independence, then hiding your funder list undermines the mechanism.

Key Assumptions

Assumption 1: Public-information-based ratings create meaningful pressure on companies.

Evidence for: xAI improved from 0% to 18%, Meta from 14% to 22% between the two rating rounds. TIME, NBC, Fortune cover the ratings. Yoshua Bengio endorses them.
Evidence against: Anthropic and Google declined. Companies can and do dismiss the ratings as incomplete ("our comprehensive approach extends well beyond what's captured" -- Google). No evidence that any company changed practices specifically because of SaferAI ratings vs. other pressures.
Testable: Track whether companies' scores improve over multiple rounds and whether improvements correlate with rating-specific concerns.
If wrong: SaferAI becomes a scorekeeper that no one listens to -- all measurement, no influence.

Assumption 2: Aviation/nuclear risk management frameworks can be meaningfully adapted to AI.

Evidence for: The risk management literature is mature and industry-agnostic at the framework level. The specific adaptations SaferAI proposes (KRI/KCI thresholds, risk modeling, Delphi studies) are concrete. The EU Code of Practice has adopted several elements SaferAI advocated for.
Evidence against: Stuart Russell: "none of the current activity provides any kind of quantitative guarantee of safety; nor does it seem possible to provide such guarantees given the current approach to AI via giant black boxes." AI systems change fundamentally with each new generation in ways that planes and nuclear reactors don't. The threat landscape is adversarial in a way that aviation isn't.
Testable: Whether the quantitative risk models produce estimates that correlate with actual incidents/harms as capabilities advance.
If wrong: SaferAI is importing a methodology from a domain where it worked into a domain where it may be fundamentally inapplicable.

Assumption 3: Standards bodies are the right lever for influencing AI development.

Evidence for: The EU AI Act is the most comprehensive AI regulation globally. Standards bodies shape what compliance looks like. SaferAI has people literally writing the standards. Historical precedent: automotive safety standards transformed the industry.
Evidence against: The US has rolled back AI oversight under the current administration. Chinese companies are outside the EU's regulatory reach. 46 tech CEOs sent a letter asking for a two-year pause on EU GPAI obligations. Enforcement is uncertain.
Testable: Whether EU standards propagate globally or remain regional. Whether enforcement actions follow from the standards SaferAI helped write.
If wrong: SaferAI has invested heavily in a European regulatory framework that may be geopolitically contained while the most important AI development happens in the US and China.

Assumption 4: SaferAI can maintain independence while developing commercial relationships with the companies it rates.

Evidence for: The Anthropic downgrade while Fabien Roger sits on the board. The right-of-reply process is transparent.
Evidence against: The G42 advisory/rating dual role. The trajectory toward commercial advisory revenue. The absence of a published conflict of interest policy.
Testable: Whether SaferAI publishes a COI policy, discloses all advisory relationships, and shows consistent willingness to rate clients poorly.
If wrong: SaferAI becomes captured by the industry it's supposed to assess -- the exact failure mode that the 2008 financial crisis revealed in credit rating agencies.

Strengths

Unusually concrete and measurable theory of change. Most AI safety orgs have vague theories of change ("do alignment research" or "influence the policy conversation"). SaferAI's three-pronged approach -- measure, rate, standardize -- has clear intermediate steps and observable outputs. You can point to the ratings, the standards contributions, and the EU contract as evidence of the mechanism working.
Strong team for the specific niche. A Superforecaster running risk modeling, a spacecraft engineer writing standards, an ex-OECD policy head advising on governance, an ex-Berkeley researcher running operations. This is not a typical EA org; the team was deliberately assembled from industries with risk management cultures.
Media reach disproportionate to size. Three TIME features, a TIME op-ed, IEEE Spectrum, NBC, Fortune -- for a ~15-person French nonprofit. This amplifies the ratings' accountability pressure well beyond SaferAI's direct reach.
EU CBRN contract is a genuine inflection point. Government contract revenue, multi-year stability, direct influence on EU regulatory enforcement. This is the theory of change actually working: SaferAI's expertise is being purchased by the people who will enforce AI regulation.
Standards body access is a structural advantage. People literally writing the standards that will shape regulation. This is slow, unsexy, high-leverage work that few AI safety orgs pursue.
Novel technical contributions. The LLM-simulated expert elicitation approach is genuinely innovative. If validated, it makes quantitative risk assessment dramatically more scalable. The cyber risk models are the most detailed proof-of-concept available.

Weaknesses and Risks

Financial opacity undermines credibility. SaferAI rates companies on transparency but doesn't practice it for its own finances. No published budget, no annual report, undisclosed funders. For an org whose value depends on being an independent, credible assessor, this is a significant vulnerability.
Advisory/rating dual role creates structural conflicts. The G42 case -- advising on a framework and then rating it -- mirrors the pre-2008 credit rating agency problem. As commercial revenue grows, this conflict could become systemic. SaferAI has no published COI policy to manage it.
Fabien Roger conflict is unaddressed structurally. Board member works at the highest-rated company. The behavioral evidence suggests independence (Anthropic was downgraded), but the structural conflict should be addressed through formal recusal procedures, not just good behavior.
Public-information-only methodology has fundamental limitations. SaferAI rates what companies CLAIM they do, not what they ACTUALLY do. The frontier auditing paper argues that meaningful assurance requires deep access to non-public information. SaferAI's approach maps to the lowest assurance level in that framework.
Funder concentration risk. Jaan Tallinn appears to be the largest funding source across multiple vehicles. Single-individual dependency is fragile.
European focus in an increasingly US/China-dominated field. SaferAI's deepest engagement is with EU institutions. The EU AI Act matters, but the most consequential AI development happens at US labs and increasingly Chinese ones. US regulatory rollback and Chinese opacity limit the reach of EU-centric standards.
No demonstrated causal impact. After two rating rounds, there's no evidence that SaferAI specifically caused companies to improve (vs. other pressures driving the same changes). Two companies' scores actually declined. Without causal impact evidence, the ratings are informative but not demonstrably effective.

Cross-References

FLI (Future of Life Institute): Direct partner. SaferAI provides the Safety Frameworks domain analysis for the FLI AI Safety Index. FLI provides a larger platform that amplifies SaferAI's ratings.
FAR.AI: CBRN consortium partner with FAR.AI leading. Collaborative relationship.
METR: Collaborated on G42 framework. METR does model evaluations (different from SaferAI's framework-level assessment).
GovAI: Malcolm Murray is a Research Affiliate. GovAI is a CBRN consortium partner.
Anthropic: Fabien Roger dual role. SaferAI rates Anthropic; Anthropic generally scores highest.
EffiSciences/CeSIA: French AI safety ecosystem that Simeon helped build. Relationship unclear.
Open Philanthropy/CG: Notably absent. SaferAI has zero grants from the largest AI safety funder.
Credit rating agencies (analogy): SaferAI's model closely mirrors S&P/Moody's in structure and faces the same structural risks -- particularly the consulting-and-rating dual role.

What Would Change This Assessment

SaferAI publishing a comprehensive annual report with full financial disclosure would significantly increase confidence in independence. This is the single most impactful thing they could do.
SaferAI publishing a formal COI policy addressing Fabien Roger, advisory relationships, and future conflicts would address the governance gap.
Evidence that a company changed specific practices citing SaferAI ratings as the reason would demonstrate causal impact, not just correlation.
A substantive public critique of SaferAI's methodology by a credible external party would test whether their framework holds up under scrutiny.
Loss of the EU CBRN contract or failure to renew would suggest SaferAI's work isn't valued by the regulators it's trying to influence.
SaferAI gaining access to non-public company information (moving up the AAL ladder in the frontier auditing framework) would substantially strengthen their assessment capability.
A company threatening or taking legal action over a rating would test SaferAI's institutional resilience and independence.

Self-Critique

What sources should I have checked but didn't?

French-language podcasts (The Flares, Le Futurologue) -- could contain additional candid material
Le Monde coverage referenced in bios -- could not be found
Manifund project page comments -- could not be fetched (429 errors)
Euractiv article -- blocked by site
LinkedIn discussions about SaferAI

Where is this analysis potentially biased?

I may be overly critical of the governance gaps because I'm primed to look for conflicts of interest by the analyst instructions. The advisory/rating dual role is a genuine concern but SaferAI is very early-stage and may address it as they mature.
I may be underweighting the value of SaferAI's standards work because it's less visible than the ratings. Standards participation is harder to evaluate from the outside but may be the highest-impact activity.
The absence of criticism could be over-interpreted. Small orgs often don't attract critics; absence of evidence isn't evidence of problems.

What would a thoughtful person who disagrees say? "SaferAI is one of the only organizations actually trying to build the measurement infrastructure that AI regulation will need. You're criticizing them for governance issues that every 3-year-old nonprofit has, while ignoring that they've already achieved EU government contracts, FLI partnerships, standards body seats, and TIME coverage with 15 people. The Fabien Roger conflict is managed in practice even if not in policy. The financial opacity is a French nonprofit norm, not a red flag. The real question is whether they'll scale fast enough to matter, not whether their COI policy is published."

What's my single weakest claim? My claim that the advisory/rating dual role is structurally concerning analogous to pre-2008 credit rating agencies. The analogy may be too strong -- credit rating agencies were operating in a $600B+ CDO market with massive financial stakes; SaferAI is a tiny nonprofit whose ratings don't directly drive investment decisions. The incentive pressure is orders of magnitude smaller.

What information would most change my view? SaferAI's complete funder list and budget. If it turns out that SaferAI is primarily funded by entities with no financial interest in AI companies (e.g., exclusively Jaan Tallinn and similar EA funders), the independence concern weakens substantially. If it turns out that AI companies themselves are significant funders, the concern strengthens dramatically.

Connected to (13)

CEN-CENELEC JTC 21collaborator · James Gealy

Centre for the Governance of AIcollaborator · Malcolm Murray

FAR.AIcollaborator

SecureBiocollaborator

G42advisor at

METRcollaborator

Anthropicboard overlap · Fabien Roger

Founders Pledgecollaborator

Future of Life Institutecollaborator

OECDcollaborator · Simeon Campos

Redwood Researchboard overlap · Fabien Roger

UC Berkeley / CHAIstaff from · Henry Papadatos

EffiSciencesspun off from · Simeon Campos

Sources (52)

Every URL that was read during research.

1.About – SaferAIsafer-ai.org
2.SaferAIsafer-ai.org
3.Research – SaferAIsafer-ai.org
4.Teamsaferai-dev.webflow.io
5.Careers – SaferAIsafer-ai.org
6.Risk Management Ratingsratings.safer-ai.org
7.Comparison – Risk Management Ratingsratings.safer-ai.org
8.OpenAI – Risk Management Ratingsratings.safer-ai.org
9.The First AI Risk Management Ratings Expose a Two-Speed Industry with Common Shortcomings – SaferAIsafer-ai.org
10.Siméon Campos on Short Timelinestheinsideview.ai
11.Siméon Camposoecd.ai
12.Siméon Camposrusi.org
13.Siméon Campos - TRT World Forumtrtworldforum.com
14.Simeon Campos - Foresight Instituteforesight.org
15.Henry Papadatos – SaferAIsafer-ai.org
16.Henry Papadatoshenrypapadatos.com
17.Henry Papadatosoecd.ai
18.Henry Papadatos | AI Frontiersai-frontiers.org
19.Top AI Labs Have 'Very Weak' Risk Management, Study Findstime.com
20.Which AI Companies Are the Safest—and Least Safe?time.com
21.Top AI Firms Fall Short on Safety, New Studies Findtime.com
22.Fabien Roger – SaferAIsafer-ai.org
23.Foresight Fellow Special | Siméon Campos: On Governing AI for Goodforesightinstitute.substack.com
24.A Methodology for Quantitative AI Risk Modeling – SaferAIsafer-ai.org
25.Technical report: LLM-simulated expert judgement for quantitative AI risk estimation – SaferAIsafer-ai.org
26.Toward Quantitative Modeling of Cybersecurity Risks Due to AI Misusearxiv.org
27.Les dangers de l'IA - avec Siméon Campos (SaferAI) • The Flaresthe-flares.com
28.SaferAI selected to support EU AI Act implementation through CBRN risk assessment – SaferAIsafer-ai.org
29.Meet Superforecaster® Malcolm Murraygoodjudgment.substack.com
30.How the EU's Code of Practice Advances AI Safetyai-frontiers.org
31.Malcolm Murray – SaferAIsafer-ai.org
32.Duncan Cass-Beggs – SaferAIsafer-ai.org
33.Siméon Campos - Artificial intelligenceactuia.com
34.Siméon CAMPOSscai.gov.sg
35.Contributorsinternationalaisafetyreport.org
36.Anthropic – Risk Management Ratingsratings.safer-ai.org
37.Leading AI Companies Get Lousy Grades on Safetyspectrum.ieee.org
38.It's 'kind of jarring': AI labs like Meta, Deepseek, and Xai earned some of the worst grades possible on an existential safety index | Fortunefortune.com
39.Leading AI companies' safety practices are falling short, new report saysnbcnews.com
40.Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices at Leading AI Companiesarxiv.org
41.G42 – Risk Management Ratingsratings.safer-ai.org
42.A Frontier AI Risk Management Framework – SaferAIsafer-ai.org
43.A Framework to Rate AI Developers’ Risk Management Maturity – SaferAIsafer-ai.org
44.Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation – SaferAIsafer-ai.org
45.6 Expert-Backed Claims on AI Risk Management – SaferAIsafer-ai.org
46.FAR.AI Selected to Lead EU AI Act CBRN Risk Consortiumfar.ai
47.Linear Probe Penalties Reduce LLM Sycophancyarxiv.org
48.Invent a Better Everyday | Abu Dhabi, UAE | G42 | G42's Frontier AI Safety Frameworkg42.ai
49.Methodology – Risk Management Ratingsratings.safer-ai.org
50.Gabor Szorad – SaferAIsafer-ai.org
51.‘GiveWell for AI Safety’: Lessons learned in a weekmanifund.substack.com
52.Paris AI Summit Could Be Leaders’ Chance to Manage AI Riskstime.com