Theory of Change
FRI aims to improve decision-making on catastrophic risks by advancing the science of forecasting. The causal chain: better forecasting methods produce more reliable probability estimates, which create shared baselines for policy discussions, which lead to better-calibrated responses to existential risk.
Cofounder Ezra Karger draws an analogy to the Federal Reserve's Survey of Professional Forecasters: "If we're going to continue to have discussions about existential risks, it seems useful to have forecasts that we in the future will track over time that tell us how people's beliefs about risks are changing." He frames FRI's value not as producing definitive numbers but as making implicit beliefs explicit: "Experts think this; accurate forecasters think this. They might both be wrong, but we can at least start from here."
CEO Josh Rosenberg identifies the specific gap: "There still is this big gap where it seems like [forecasting is] not being taken up as much by important decision makers as it could be." FRI pursues a two-pronged strategy from its 2022 founding announcement: foundational research (handling low-probability events, long-run outcomes, complex topics) and translational work (making forecasting decision-relevant, mapping disagreements, identifying useful contexts).
What They Do
Existential Risk Persuasion Tournament (XPT, 2022): 169 participants produced the most widely-cited dataset of x-risk probability estimates. Domain experts: 6% extinction by 2100, 20% catastrophic risk. Superforecasters: 1% extinction, 9% catastrophic. AI specifically: 3% extinction (experts) vs. 0.38% (superforecasters). The 750,000x question-wording effect was discovered here -- changing the response format from "percentage" to "1-in-X with examples" shifted median extinction estimates from 5% to 1-in-15-million.
AI Adversarial Collaboration (2023): 22 participants debated AI risk for 8 weeks (median 80 hours for skeptics). "Concerned" group: 20% AI extinction. "Skeptical" group: 0.12%. Despite extensive engagement, views converged negligibly. The best short-term crux (METR finding dangerous AI capabilities) would close only 5% of the disagreement gap. Key finding: both groups agreed powerful AI would be developed by 2100 (90% and 88% respectively), but disagreed on timing -- the "skeptics'" median date for AI displacing humans was 2450, the "concerned" group's was 2045.
LEAP (launched June 2025): 339 experts across CS, industry, economics, and policy, plus 60 superforecasters and 1,400 public participants. Monthly surveys over 3 years. Key findings: median expert predicts 18% of work hours assisted by generative AI by 2030; experts predict significantly less progress than frontier lab CEOs but more than the public. In Wave 4 (Dec 2025), the median expert predicted 14% AI accuracy on LiveCodeBenchPro Hard -- GPT-5.2 hit 33% shortly after.
ForecastBench (Sept 2024): Benchmark comparing LLM vs. human forecasting accuracy. Accepted at ICLR 2025. GPT-4.5 beats the public median but not superforecasters, with AI parity projected late 2026.
Nuclear and Biorisk Forecasting: Nuclear study (110 experts + 41 superforecasters) found 1-5% catastrophe risk by 2045 and identified six tractable policies. Biorisk study found AI milestones would increase epidemic risk from 0.3% to 1.5%, but mitigation measures could reduce it back to baseline. In both domains, experts underestimated AI capabilities timelines.
Policy engagement: Contributed capability forecasts to the International AI Safety Report 2026. UN First Committee side event on nuclear risk. GiveWell contract ($200K) for water chlorination forecasting -- FRI's first non-x-risk applied project.
Key People
Philip Tetlock -- President, Chief Scientist. PIK Professor at UPenn (Wharton + SAS). Author of Expert Political Judgment (2005) and Superforecasting (2015). Co-created the Good Judgment Project (IARPA tournament winner). Also co-founder/advisor of Good Judgment Inc. (commercial forecasting). FRI is essentially "Tetlock's academic research program given institutional form."
Josh Rosenberg -- CEO (since mid-2023, replacing Page Hedley after 7 months). Former Senior Advisor at GiveWell. Focused on making forecasting decision-relevant for policymakers.
Ezra Karger -- Director of Research, Cofounder. Senior Economist at the Federal Reserve Bank of Chicago. PhD Economics, U of Chicago. Himself a superforecaster from IARPA tournaments. Receives no compensation from FRI. Also an advisor to Open Philanthropy (FRI's primary funder).
Team is ~15-20 core staff plus ~25 RAs/consultants. Remote/distributed. Board of 5 includes Tetlock and Rosenberg plus three academics.
Money and Incentives
Total known funding: ~$18.5M. Nearly 100% from a single source:
| Source | Amount | Date Range |
|---|---|---|
| Coefficient Giving / Open Phil | $18,325,525 | 2022-2025 (9 grants) |
| GiveWell | $200,808 | 2025 |
The two largest grants: $10.08M general support (Feb 2025, 3-year) and $6.3M Science of Forecasting (Dec 2022, 3-year). Project-specific grants for LEAP ($1.07M), tripwire evaluations ($359K), ForecastBench ($100K), and others.
Financial data (990): 2023 expenses were $1.44M with $360K top compensation and $4.15M in net assets. The organization is still spending down initial grants -- the $10.08M 2025 grant significantly extends the runway.
Business model: Purely philanthropic grants. No product revenue, no government contracts, no commercial clients (the GiveWell project is itself grant-funded).
Funder dependency is extreme. Open Phil/Coefficient Giving provides 99%+ of all funding. This is the highest funder concentration of any AI safety org examined. Open Phil's "Forecasting" focus area (~$50M across 30+ grants) treats FRI as the primary vehicle.
Structural conflict: Ezra Karger (FRI's research director, uncompensated) simultaneously advises Open Philanthropy (FRI's near-sole funder). This is an unusual degree of entanglement between researcher and funder.
What Others Say
The fundamental methodological critique (Narayanan & Kapoor, AI Snake Oil/Normaltech, July 2024): "AI x-risk forecasts are far too unreliable to be useful for policy, and in fact highly misleading." They argue there's no reference class for AI extinction, subjective probabilities "are nothing more than guesses," forecast skill is undetectable for tail risks (proving one forecaster wrong over another on 1% vs. 0.001% events would require billions of observations), and FRI's own 750,000x question-wording effect proves "speculation gets laundered through pseudo-quantification." They cite FRI's XPT as "the most elaborate and well-executed x-risk forecasting exercise" while arguing its numbers are fundamentally unreliable. Notably, Karger was acknowledged for feedback on the paper.
The LEAP methodology error (EA Forum, Nov 2025, ~86 karma): A community member identified that FRI's LEAP survey (1) framed even the "slow progress" scenario as describing near-AGI, and (2) presented vote-share predictions as probabilities. A follow-up post provided a detailed breakdown of the error. FRI revised the report in response, which critics acknowledged positively.
Systematic AI underestimation (Epoch AI, Jan 2026): Independent analysis found XPT superforecasters gave only 9.7% average probability to observed AI progress outcomes (domain experts: 24.6%). This pattern persists across XPT, LEAP, biorisk, and virology benchmark timelines.
The adversarial collaboration reaction (BetterWithoutAI): "The experiment was a roaring success, in the sense that it confirmed that both groups were almost perfectly immune to evidence or arguments against their beliefs." Called both groups' estimates "essentially meaningless."
Joe Carlsmith on superforecaster numbers: After superforecasters reviewed his power-seeking AI report (giving 1% extinction risk vs. his 5%), he reported not updating heavily because their "written arguments haven't moved me much" and he's "unsure how much to defer to raw superforecaster numbers" for long-term questions.
Good Judgment Inc. on ForecastBench: Argued the benchmark's design advantages AI through a frozen human baseline, data-heavy questions, and multiple AI attempts. "Bullish is not the word here" -- superforecasters who predicted early AI parity simultaneously argued the milestone wouldn't mean much.
What's Absent
No documented policy impact. After $18M+ and 3+ years, there is no published case of any government, AI lab, or major institution changing a decision based on FRI forecasts. Contributing to the IAISR 2026 is the closest evidence.
Epistemic Audits disappeared. Listed in the founding announcement as a key service line. No evidence of any being conducted. The research page now references "Epistemic Reviews" as in progress.
Zero revenue diversification. No SFF, EA Funds, government grants, or private donors identified beyond CG/OP and GiveWell.
No published conflict-of-interest policy despite multiple structural conflicts (Tetlock-GJI, Karger-OP, CG participant selection).
No self-assessment or track record evaluation. An org devoted to calibration and accuracy has never published a systematic review of its own forecasting track record.
Recommended Reading
80K Hours podcast with Ezra Karger (#200, Sept 2024) -- The most candid insider view. Karger is disarmingly honest about what FRI can and cannot do. 2h49m. Link
"AI existential risk probabilities are too unreliable to inform policy" (Normaltech, July 2024) -- The strongest counterargument to FRI's entire approach. Uses FRI's own data against them. Link
Results from an Adversarial Collaboration on AI Risk (March 2024) -- FRI's most honest output. Shows that structured debate doesn't resolve deep disagreements about AI risk. Link
How well did forecasters predict 2025 AI progress? (Epoch AI, Jan 2026) -- Independent evidence that the forecaster population FRI relies on systematically underestimates AI progress. Link
LEAP Waves 1-3 Insights Report (Nov 2025) -- The most ambitious current output: 339 experts forecasting AI trajectory with detailed rationales. Link