Theory of Change
SaferAI's theory of change: AI risk management is decades behind other high-risk industries; external pressure through independent ratings, quantitative risk modeling, and embedding in international standards bodies can close this gap.
Founder Simeon Campos, in a 2022 interview: "AI auditing is one of the most promising way to ensure that if at some point there are existing alignment tools, they will be implemented actually by top organization and top labs that are developing AGI." SaferAI was conceived as the enforcement arm for alignment work done by others -- the bet is that labs might develop safety tools but won't use them without external accountability.
The implementation has three prongs:
- Ratings: Publicly score frontier AI companies' risk management practices, creating reputational pressure to improve
- Standards: Embed SaferAI personnel in the bodies writing AI standards (CEN-CENELEC, ISO, OECD, EU Code of Practice, NIST AISIC) so that aviation/nuclear-style risk management principles get baked into regulatory infrastructure
- Risk modeling: Develop quantitative methods to translate AI benchmark scores into real-world risk estimates, giving regulators concrete tools rather than abstract frameworks
What They Do
Ratings (flagship product): First released Oct 2024 with 6 companies, expanded to 12 in Jul 2025. Framework has 65 criteria across risk identification, risk tolerance/analysis, risk mitigation, and risk governance. Scores: Anthropic 35%, OpenAI 33%, Meta 22%, Google DeepMind 20%, xAI 18%. No company scored above "weak." Companies get two weeks' notice and right of reply before publication. SaferAI provides the Safety Frameworks domain analysis for the FLI AI Safety Index (all editions).
Standards engagement: James Gealy leads the high-risk AI risk management standard at CEN-CENELEC JTC 21 and edits the ISO red teaming standard. SaferAI is the only NGO member of the OECD G7 Hiroshima Process task force. Founding member of the US NIST AI Safety Institute Consortium. Henry Papadatos contributed to the EU Code of Practice for GPAI models.
EU CBRN contract (Feb 2026): Selected by EU AI Office for a 3-year contract to lead risk modeling and monitoring for CBRN threats from GPAI models. Consortium with FAR.AI (lead), SecureBio, GovAI, Nemesys Insights, Equistamp.
Quantitative risk research (Dec 2025): Three papers released simultaneously: risk modeling methodology, 9 detailed cyber risk models using MITRE ATT&CK framework, and LLM-simulated expert elicitation (using Claude 3.7 Sonnet with 5 expert personas to scale risk estimation). The LLM approach achieved R-squared of 0.46 for predicting task difficulty and estimates that fell between two human expert groups.
Advisory: Collaborated with METR to develop G42's Frontier AI Safety Framework (published Feb 2025). SaferAI also rates G42 in its public ratings.
Key People
Simeon Campos -- Founder, Board Chairman (previously ED). Co-founded EffiSciences (AI alignment field-building, France) before SaferAI. Technical expert at ISO/IEC and CEN-CENELEC. Foresight Institute fellow. Short timelines worldview. Strategic thinker who identified the auditing/standards niche early. Transitioned from ED to Chairman; Henry Papadatos now runs operations.
Henry Papadatos -- Executive Director. Background: LLM sycophancy research at CHAI/UC Berkeley, NeurIPS publication. OECD AI Risk & Accountability expert group. EU Code of Practice contributor. Writes op-eds in TIME. Bridges technical AI research and governance policy.
Malcolm Murray -- Research Lead. Certified Superforecaster (Good Judgment Project top 2%). 20+ years risk experience (ex-Gartner Chief of Research). CFA, MBA INSEAD. Chapter Lead for International AI Safety Report. Runs the Delphi studies that underpin SaferAI's quantitative risk methodology.
Team size: ~15, with unusual backgrounds for an AI safety org: spacecraft engineer, automotive safety expert, particle physicist, ex-Microsoft EU Government Affairs director, ex-OECD strategic foresight head. Deliberately recruited from industries with established risk management cultures.
Money and Incentives
Total budget: unknown. SaferAI does not publish financial statements. States it has "multiple institutional and private funders from the wider AI safety space" but "cannot disclose the full list publicly" without explanation.
Confirmed funding:
- Founders Pledge: "primary funder" per TIME (amount undisclosed)
- Jaan Tallinn: "primary funder" per TIME (amount undisclosed; also funds SaferAI through SFF)
- SFF 2025: $311K recommended ($307K Freedom Track + $4K Fairness Track), plus $90K speculation grant and $111K matching pledge
- Manifund: ~$125K from a single donor
- EU AI Office: CBRN consortium contract (3-year, amount undisclosed but part of EUR 9M total tender)
- Total confirmed public: ~$637K. Actual budget likely $1.5-3M given ~15 staff in Paris.
Business model: Grants/donations plus new government contract revenue. Actively developing commercial advisory revenue (Gabor Szorad, Product Advisor, is working to "increase financial independence and deliver benefits to companies interested in responsible AI"). Zero CG/Open Philanthropy grants.
Funder concentration: Jaan Tallinn appears to be the single largest funding source across multiple vehicles (direct, Founders Pledge, SFF). One-person dependency risk.
Key incentive concern -- advisory/rating dual role: SaferAI advised G42 on developing their safety framework, then rated G42 in its public ratings system. If the commercial advisory business grows, this conflict pattern could become structural. The same dynamic corrupted credit rating agencies in the financial sector. SaferAI has not publicly addressed this conflict.
Key governance concern -- Fabien Roger: SaferAI board member and technical advisor Fabien Roger is simultaneously Member of Technical Staff at Anthropic. SaferAI rates Anthropic. Mitigating evidence: SaferAI downgraded Anthropic's score and called their RSP changes "very bad process." But the structural conflict exists regardless of how it has played out so far.
What Others Say
Endorsements: Yoshua Bengio endorsed the ratings: "we can't let them grade their own homework." The FLI Safety Index uses SaferAI's methodology for its Safety Frameworks domain. TIME has covered SaferAI three times.
Company responses: Google DeepMind's response to its rating: "our comprehensive approach to AI safety extends well beyond what's captured." Anthropic did not respond to a request for comment when its score declined.
Structural criticism (not about SaferAI specifically): Stuart Russell (FLI panel): "None of the current activity provides any kind of quantitative guarantee of safety; nor does it seem possible to provide such guarantees given the current approach to AI via giant black boxes." This challenges the premise that aviation-style risk frameworks can work for AI. The frontier AI auditing paper (Jan 2026) argues that meaningful assurance requires deep access to non-public information -- SaferAI's public-information-only approach maps to the lowest assurance level in that framework.
Absence of criticism: Despite extensive searching, zero direct criticism of SaferAI was found -- no forum posts, no critical articles, no public debate about methodology or independence. SaferAI is likely too small to attract dedicated critics, and the AI safety community may broadly support external rating efforts even if the specific methodology has limitations.
What's Absent
- No published financial statements or annual report. For a nonprofit that rates companies on transparency, this is a notable gap.
- No published conflict of interest policy. Two active conflicts (Fabien Roger/Anthropic, G42 advisory/rating) exist without documented management procedures.
- No external validation of SaferAI's own ratings methodology. SaferAI rates companies on third-party verification but has not subjected its own methodology to independent review.
- No impact assessment. After two rounds of ratings, SaferAI has not published data on whether companies improved practices in response. Two companies' scores declined in the second round.
- No information about advisory clients beyond G42. SaferAI mentions advising "a major AI company" but doesn't name it. The full extent of potential rating conflicts is unknown.
- No discussion of limitations on the public-facing ratings site. The academic papers acknowledge limitations; the ratings product does not.
Recommended Reading
Inside View podcast: Simeon Campos (Oct 2022, published Apr 2023) -- 2-hour interview revealing founding worldview, short timelines thinking, strategy for AI auditing. The most candid source on why SaferAI exists and how the founder thinks. https://theinsideview.ai/simeon
Frontier AI Auditing paper (Jan 2026) -- Not about SaferAI, but the most rigorous treatment of the structural challenges SaferAI faces: information asymmetry, self-reporting bias, independence concerns. Proposes AI Assurance Levels that place SaferAI's current approach at the lowest rung. The strongest implicit counterargument to SaferAI's methodology. https://arxiv.org/html/2601.11699v1
TIME: "Top AI Firms Fall Short on Safety" (Jul 2025) -- Best single-article overview of SaferAI's ratings results and willingness to criticize even top-rated Anthropic. Shows SaferAI's media reach relative to its size. https://time.com/7302757/anthropic-xai-meta-openai-risk-management-2/
Henry Papadatos: "How the EU's Code of Practice Advances AI Safety" -- Detailed, technically sophisticated policy analysis showing SaferAI's influence on EU regulation and their analytical depth. https://ai-frontiers.org/articles/how-the-eus-code-of-practice-advances-ai-safety
LLM-simulated expert judgement technical report -- SaferAI's most novel technical contribution. Uses LLMs to scale up quantitative risk estimation. Shows both promise and current limitations. https://www.safer-ai.org/technical-report-llm-simulated-expert-judgement-for-quantitative-ai-risk-estimation