Quantified Uncertainty Research Institute (QURI)

Forecasting

Squiggle. Uncertainty quantification.

Founded: 2019
HQ: Berkeley, CA
Team: 1
Structure: fiscally sponsored
Model: Grants

Theory of Change

QURI's theory of change, in Ozzie Gooen's own words: "Our bigger agenda is... can we make these very advanced estimation infrastructure systems and then apply them to many things?" The causal chain runs: build tools for probabilistic estimation at scale --> enable people to cheaply estimate the value of everything --> better estimates lead to better coordination --> reduced global failures including AI risk.

Ozzie articulated the end state as an "estimation utopia" -- a world where "a lot of key items are parameterized and estimated, and the estimates are, we have justified trust in those estimates, they're good, and people know that they're good." In this world, you could evaluate every sentence of a government bill, score every intellectual's track record, and render warfare unprofitable through coordinated public action.

This is a "mistake theory" worldview: coordination failures are primarily estimation failures, not power struggles. Ozzie: "If you imagine replacing the 8 billion people that we have with people who are a thousand IQ... do you really think that the world would be so messy?"

Since 2023-2024, the theory of change has evolved toward "Epistemic AI" -- using LLMs to automate the evaluation and estimation work that humans have proven reluctant to do with manual tools.

What They Do

QURI is primarily a software organization. Active projects (as of early 2026):

Squiggle -- A domain-specific programming language for probabilistic estimation, running on JavaScript. In development since 2020 (Ozzie's Guesstimate preceded it from 2016). Version 0.10.0 released January 2025. Open source, MIT licensed. 99 GitHub stars, 20 forks.

Squiggle Hub -- Platform for sharing Squiggle code. Launched August 2023. Squiggle AI adds LLM-powered model generation (Claude integration). Documented adoption: 30 external users, 168 workflows over an unspecified period.

RoastMyPost -- LLM-powered blog post evaluation tool, launched December 2025. Runs narrow automated checks (fact, spell, fallacy, math, link, forecast). Optimized for EA Forum and LessWrong content. Open source, free, explicitly experimental.

Longterm Wiki -- Strategic intelligence platform for AI safety. ~736 pages tracking 527 organizations and 696 people. Quality scoring system, causal diagrams, entity cross-linking.

Metaforecast -- Aggregates forecasts from multiple prediction platforms. Now in maintenance/archived mode.

Research output has been consistently conceptual: proposals for "scorable functions" (code-as-forecast), "opinion fuzzing" (ensemble LLM sampling), "evaluation consent policies," and AI-driven question resolution protocols. A collaboration with Arb Research produced the Shallow Review of Technical AI Safety 2025 (800+ papers/posts catalogued).

The 2022 Squiggle competitions offer the hardest adoption data: 3 long entries for a $1,000 prize, 5 entries (mostly short) for a $5,000 prize, from ~1,000-1,700 EA Forum views.

Key People

Ozzie Gooen -- Executive Director, sole full-time employee. Harvey Mudd engineering. EA since ~2008 (co-founded .impact 2013, which became Rethink Charity). Founded Guesstimate (2016), FHI Research Scholar, then QURI (2019). Core worldview: utilitarian, mistake theorist, estimation maximalist. P(AI net bad) ~15% -- unusually low for EA AI safety. Supports faster AI development.

Board (3 members, all unpaid): Andrew Critch (Secretary) -- CEO of Encultured AI, Research Scientist at CHAI, volunteers for SFF. Ben Goldhaber (Treasurer) -- Director at FAR Labs. Abigail Olvera -- Research Director at Golden Gate Institute for AI.

Team has shrunk over time. Key collaborators Nuno Sempere (Metaforecast builder) left for Sentinel. Eli Lifland moved to AI safety research. Slava Matyuhin listed as "previous contractor" despite releasing Squiggle 0.10.0 in January 2025.

Money and Incentives

Financial picture is deliberately incomplete. Since 2023, QURI operations flow through Rethink Priorities fiscal sponsorship. The 501(c)(3) holds assets but understates actual spending. Ozzie's 2024 compensation from the 501(c)(3) was $0, meaning he is likely paid through RP.

501(c)(3) financials (EIN 84-3847921):

Year	Revenue	Expenses	Total Assets	Ozzie Comp (from 501c3)
2024	$87K	unknown	$100K	$0
2023	$1.5K	$279K	$304K	$36K
2022	$679K	$285K	$581K	$69K
2021	$2.5K	$85K	$187K	$47K
2020	$291K	$21K	$270K	$0

Identified funding sources:

SFF (Jaan Tallinn): ~$733K total (2020-2024). Comprises ~79% of all identified funding.
Future Fund (FTX): ~$200K (2022, defunct).
Long-Term Future Fund: Amount unknown.
Open Philanthropy: $0. Zero grants in the org's entire history.
Individual donors: Unknown amounts.
Product revenue: $0. All tools are free.

Estimated actual operating budget: $150K-$250K/year (inference from 1 FTE + 1 contractor + API costs + RP overhead).

Key financial observations:

Extreme funder concentration: one effective source (Jaan Tallinn/SFF) provides the vast majority of funding.
Revenue is extremely lumpy, driven by irregular SFF grants.
No path to financial sustainability. Zero earned revenue. No business model beyond grants.
Ozzie in February 2025: "Funding has been a highly significant bottleneck recently."
Open Philanthropy's complete absence across 7 years is a strong negative signal about how the largest EA funder views QURI's approach.

Governance overlap: Andrew Critch sits on QURI's 3-person board AND volunteers for SFF, QURI's primary funder. This is a textbook conflict of interest for a small nonprofit.

What Others Say

No external critical assessment of QURI exists. Despite extensive searching, no independent voice has published a substantive critique of QURI's approach. The org is too small and niche to attract critical attention.

Ozzie's self-criticism is the strongest available critique. In June 2023: QURI has "2 people," he feels "very isolated," and expects his "professional development to be slower." He explicitly identifies that EA has "gone too far along the axis of having lots of tiny organizations with poor ability for coordination" -- a description that applies directly to QURI itself.

On funding constraints and criticism: Ozzie acknowledges he cannot fully criticize "groups with much power over QURI" -- i.e., his funders. The person building evaluation tools is constrained from honest evaluation of his own funder network.

Broad forecasting skepticism: The general critique of QURI's space is that better forecasting does not demonstrably change decisions. Ozzie himself acknowledges "few people care about high-quality intellectual work" and forecasting organizations have "struggled to produce decision-relevant estimates." He notes that even 50% more accurate forecasts "would take time for many people to notice."

Adoption data as implicit criticism: 30 external Squiggle AI users. 3 competition entries from 1,000 views. 10 people staying for office hours at EA Global. These numbers are the community's revealed preference.

What's Absent

No documented instance of any QURI tool changing a real-world decision in 7 years of operation.
No public usage metrics for Squiggle Hub, Longterm Wiki, or RoastMyPost.
No independent evaluation of QURI's effectiveness by any charity evaluator, funder report, or external researcher.
No published plan for scaling beyond 1-2 people, despite Ozzie's own diagnosis of small-org problems.
No peer-reviewed publications for a self-described "research institute."
Actual operating budget obscured by RP fiscal sponsorship structure.

Stated Theory of Change

QURI's stated path to reducing AI risk is indirect and meta-level. The causal chain:

Build tools for probabilistic estimation at scale (Squiggle, Squiggle Hub, Squiggle AI)
These tools enable "super duper forecasters" -- small, hedge-fund-like teams -- to cheaply estimate the value of everything
Better estimates lead to better coordination among EA orgs, governments, and other decision-makers
Better coordination reduces the risk of catastrophic outcomes, including from AI

The recent "Epistemic AI" pivot adds a parallel chain:

Use LLMs to automate evaluation and estimation work (RoastMyPost, opinion fuzzing, AI question resolution)
"AI Intellectuals" gradually become trusted for strategic analysis
Better AI-powered epistemics reduce the risk of epistemic lock-in (where different groups use different AI reasoning systems that reinforce existing beliefs)
Correct beliefs win out, leading to better decisions about AI governance

The mechanism connecting these tools to AI risk reduction specifically is diffuse. Ozzie's ~15% P(AI net bad) and support for faster AI development mean he is less focused on preventing AI catastrophe than on building general epistemic infrastructure that could help with many problems, including AI.

Revealed Theory of Change

What QURI actually does, judged by allocation of time and resources:

Primary activity: Build and iterate on Squiggle (the programming language). This has consumed most of QURI's development effort since 2020. Five years, multiple developers (Slava Matyuhin, Sam Nolan, Quinn Dougherty), and the tool remains below v1.0 with ~30 documented external users.

Secondary activity: Publish blog posts and proposals about how estimation, forecasting, and epistemic AI could work. These are conceptual/visionary rather than empirical. They propose frameworks (scorable functions, opinion fuzzing, estimation consent, AI question resolution) that require adoption by platforms QURI does not control.

Tertiary activity: Build adjacent tools (RoastMyPost, Longterm Wiki, Metaforecast) that demonstrate the concept but haven't achieved wide adoption.

Notable divergence from stated theory: The stated theory requires that people USE the tools and that usage CHANGES DECISIONS. Neither condition has been demonstrated. The revealed theory of change is closer to: "build intellectually interesting tools and publish thoughtful proposals, hoping the ideas propagate." This is a research agenda, not an impact pathway.

Key Assumptions

Assumption 1: Coordination failures are primarily estimation failures.

Evidence for: Some coordination failures clearly are (e.g., voters lacking information about policy impacts). Forecasting literature shows systematic overconfidence in expert judgment.
Evidence against: Many coordination failures are driven by power, incentives, and identity, not information gaps. Climate change is well-estimated but poorly addressed. Ozzie's own example -- FTX -- was a fraud, not an estimation error.
Testable? Partially. If QURI's tools were widely adopted and decisions didn't improve, this would be falsified. But adoption hasn't happened, so the assumption remains untested.
What changes if wrong: The entire theory of change collapses. If better estimation doesn't lead to better coordination, QURI's tools are intellectually interesting but practically irrelevant.

Assumption 2: People will adopt estimation tools if the tools are good enough.

Evidence for: Spreadsheets are ubiquitous. Some EA orgs use Guesstimate. LLMs are being rapidly adopted for various tasks.
Evidence against: Seven years of QURI products with minimal adoption. Squiggle competitions attracted 3 entries from ~1K views. Squiggle AI had 30 external users. Professional forecasters still prefer simpler formats. Ozzie himself acknowledges "few people care about high-quality intellectual work."
Testable? Yes, and the current evidence is negative. The LLM-based pivot (Squiggle AI, RoastMyPost) is a bet that AI can do the work humans won't.
What changes if wrong: The shift to LLM-based tools is the backup plan. If people won't use manual estimation tools, maybe AI can do the estimation for them. This is a reasonable pivot.

Assumption 3: Epistemic AI is neglected, important, and tractable.

Evidence for: Metaculus AI forecasting competitions show LLM ensembles can match human forecasters. Ozzie's case that strategic thinking is automatable is plausible given coding/medical progress.
Evidence against: Major AI labs (Anthropic, OpenAI, Google) are all working on truthful and helpful AI. Perplexity, Elicit, and many startups are building AI research tools. The "neglected" claim is weakened by this activity.
Testable? Yes, by whether QURI's specific tools outperform or complement general-purpose AI assistants.
What changes if wrong: If Epistemic AI is not neglected (i.e., big labs do it better), QURI's niche disappears.

Strengths

Intellectual honesty. Ozzie is remarkably transparent about QURI's weaknesses, his own constraints (can't criticize funders), and the limitations of his tools. His "Downsides of Small Organizations" post is a genuine self-critique that few org leaders would publish.

Long-term vision coherence. From Guesstimate (2016) through Squiggle (2020) to Epistemic AI (2024), the core thesis has been consistent: better estimation at scale matters. The specific tools change, but the direction is stable.

Early positioning in Epistemic AI. If LLMs do enable cheap, reliable evaluation and estimation, QURI has spent years thinking about the frameworks, interfaces, and trust mechanisms this requires. The conceptual work (scorable functions, opinion fuzzing, epistemic selection protocols) could become relevant.

The Longterm Wiki. 736 pages tracking 527 organizations and 696 people in the AI safety ecosystem is genuinely useful infrastructure, regardless of QURI's other tools.

Low cost. At an estimated $150K-$250K/year, QURI is extremely cheap compared to most AI safety orgs. Even modest contributions to the ecosystem (Shallow Review, Longterm Wiki, Metaforecast) may justify this cost.

Weaknesses and Risks

Seven years without demonstrated impact. The most serious weakness. No documented instance of any QURI tool changing a decision. No usage metrics published. No external evaluation. When adoption data does exist (30 users, 3 competition entries), it's discouraging.

Single point of failure. QURI is Ozzie. If he loses motivation, gets sick, or pivots to something else, QURI ends. No succession plan, no institutional resilience, no second person who deeply understands the codebase and vision.

Chronic financial fragility. ~79% funding from one source (Jaan Tallinn via SFF). Zero from Open Philanthropy. No earned revenue. Ozzie describes funding as a "highly significant bottleneck." An org that has been fundraising for 7 years without securing stable multi-year funding may never.

The adoption gap. QURI builds tools for a problem (poor estimation) that decision-makers don't perceive they have. Forecasting platforms exist, prediction markets exist, LLM assistants exist -- and all of these have more users than Squiggle. The question "why will someone use this?" has not been convincingly answered.

Funder-worldview mismatch. QURI's primary funder (SFF/Jaan Tallinn) presumably funds AI safety work because they believe AI risk is severe. Ozzie puts P(AI net bad) at ~15% and supports faster AI development. This is a tension that could lead to funding withdrawal if funders realize the mismatch.

Project proliferation without completion. Guesstimate, Squiggle, Squiggle Hub, Squiggle AI, Metaforecast, relative values, RoastMyPost, opinion fuzzing, Longterm Wiki, scorable functions, evaluation consent, Shallow Review collaboration -- for a 1-2 person org over 7 years, this is too many projects. Many remain half-built or in maintenance mode.

Cross-References

Metaculus -- The closest comparison in forecasting infrastructure. Metaculus has actual users (thousands), prediction tournaments, institutional clients, and demonstrated forecasting accuracy. QURI's Squiggle is theoretically more expressive (functions vs. point estimates) but has orders of magnitude fewer users. QURI's scorable functions proposal would need a platform like Metaculus to adopt it, and Metaculus has not.

Rethink Priorities -- QURI's fiscal sponsor. RP is a much larger research org (~40 staff) that does quantitative cause prioritization research. The fiscal sponsorship relationship suggests RP sees some value in QURI's work, but the details are opaque.

MIRI -- Opposite end of the AI safety theory-of-change spectrum. MIRI works on fundamental alignment theory; QURI works on meta-level epistemic infrastructure. MIRI has also struggled with demonstrated impact, but has published influential technical work. QURI has not produced work of comparable influence.

Elicit/Ought -- AI-powered research tools with similar "better epistemics" goals. Elicit has actual users and VC funding. QURI's approach is more research-oriented and less product-oriented.

What Would Change This Assessment

Usage metrics showing significant adoption. If Squiggle Hub had 1,000+ active users, or RoastMyPost was routinely used by EA Forum writers, the "no adoption" critique dissolves.
A documented case where QURI tools changed a real decision. One compelling case study -- a funder using Squiggle models to allocate grants, an org using relative values to prioritize -- would dramatically strengthen the theory of change.
Stable multi-year funding from a second major source. If Open Philanthropy funded QURI, or if a new funder provided multi-year support, the financial fragility concern diminishes.
Hiring a second full-time person who stays. This would indicate the org is growing, not just sustaining one person's research agenda.
External validation of Epistemic AI thesis. If a major AI lab or research group began citing QURI's frameworks, or if RoastMyPost evaluations became a standard in EA writing, the "too niche" concern weakens.

Self-Critique

What sources should I have checked but didn't:

Rethink Priorities' 990s, which might reveal QURI's actual operating budget through the fiscal sponsorship.
The Doom Debates podcast full transcript (April 2025), which would show Ozzie's evolved AI safety views.
Ozzie's EA Forum shortform posts (quick takes), which contain extensive writing about AI policy and P(doom).
Squiggle Discord member count and activity level.

Where this analysis is potentially biased:

I may be too harsh on QURI because the adoption metrics are so low. It's possible that the tools are highly valuable to a small number of users in ways not captured by public metrics.
I weight "demonstrated impact" heavily. QURI could argue that intellectual infrastructure takes decades to pay off, and 7 years is too early to judge.
My framework favors legible, measurable impact. QURI's influence on how EA thinks about estimation and forecasting may be real but unmeasurable.

What would a thoughtful person who disagrees say: "You're judging a research institute by product metrics. Squiggle doesn't need a million users to be valuable -- it needs to enable a few key models that change how EA allocates resources. The Longterm Wiki, the Shallow Review, the conceptual frameworks -- these shape how the community thinks. That's QURI's real theory of change, and it's working."

Single weakest claim: My claim that Open Philanthropy's absence is a strong negative signal. OP's grant decisions reflect their priorities and capacity, not necessarily quality judgments about every applicant. Some excellent orgs simply haven't applied or don't fit OP's current categories.

What information would most change my view: Evidence that Squiggle models have been used in actual EA funding decisions, or that the Longterm Wiki is routinely consulted by major funders. This would show that QURI's tools matter even with low user counts -- quality of users matters more than quantity.

Connected to (9)

Arb Researchcollaborator

CHAIboard overlap · Andrew Critch

Rethink Prioritiescollaborator

Encultured AIboard overlap · Andrew Critch

FAR AIboard overlap · Ben Goldhaber

Institute for AI Policy and Strategyadvisor at · Peter Wildeford

Sentinelstaff to · Nuno Sempere

Survival and Flourishing Fundboard overlap · Andrew Critch

Future of Humanity Institutestaff from · Ozzie Gooen

Sources (42)

Every URL that was read during research.

1.Aboutquantifieduncertainty.org
2.Teamquantifieduncertainty.org
3.The Quantified Uncertainty Research Institute Blogquantifieduncertainty.org
4.Squigglesquiggle-language.com
5.About Squiggle Hubsquigglehub.org
6.Ozzie Gooenmutualunderstanding.substack.com
7.About - The QURI Medleyquri.substack.com
8.Eli and Ozzie: Is Forecasting a Promising EA Cause Area?quantifieduncertainty.org
9.New Collaboration: Shallow Review of Technical AI Safety, 2025quantifieduncertainty.org
10.Announcing RoastMyPostquantifieduncertainty.org
11.Upcoming Workshops: Automated Research Wikis with Claude Codequantifieduncertainty.org
12.Opinion Fuzzing: A Proposal for Reducing & Exploring Variance in LLM Judgments Via Samplingquantifieduncertainty.org
13.A Sketch of AI-Driven Epistemic Lock-Inquantifieduncertainty.org
14.Evaluation Consent Policiesquantifieduncertainty.org
15.6 (Potential) Misconceptions about AI Intellectualsquantifieduncertainty.org
16.$300 Fermi Model Competitionquantifieduncertainty.org
17.GitHub - quantified-uncertainty/squiggle: This monorepo covers multiple QURI projects, including Squiggle language, Squiggle Hub and Metaforecastgithub.com
18.The Quantified Uncertainty Research Institutegithub.com
19.Presentation: Estimating Everything Everywhere Alwaysquantifieduncertainty.org
20.Scorable Functions: A Format for Algorithmic Forecastingquantifieduncertainty.org
21.Donatequantifieduncertainty.org
22.Squiggle Overview (2020)quri.substack.com
23.Updated LLM Models for SquiggleAIquantifieduncertainty.org
24.Aboutshallowreview.ai
25.Squiggle AI, Published to the EA Forumquantifieduncertainty.org
26.Recent Updatesquantifieduncertainty.org
27.Shape Squiggle's Future: Take our Squiggle Surveyquantifieduncertainty.org
28.About This Wikilongtermwiki.com
29.AI Could Give Humans MORE Control — Ozzie Gooen - Doom Debates - Podcast on iVooxivoox.com
30.Beyond Spell Check: 15 Automatable Writing Quality Checksquantifieduncertainty.org
31.Ozzie Gooen - The Quantified Uncertainty Research Institute Blogquantifieduncertainty.org
32.Eli Lifland, on Navigating the AI Alignment Landscapequri.substack.com
33.Squiggle 0.10.0quantifieduncertainty.org
34.Announcing Squiggle Hubquantifieduncertainty.org
35.Six Challenges with Criticism & Evaluation Around EAquri.substack.com
36.Downsides of Small Organizations in EAquri.substack.com
37.Squiggle 0.8.6quantifieduncertainty.org
38.AI for Resolving Forecasting Questions: An Early Explorationquantifieduncertainty.org
39.Winners of the Squiggle Experimentation and 80,000 Hours Quantification Challengesquantifieduncertainty.org
40.Ben Goldhaberbengoldhaber.com
41.Quantified Uncertainty Research Institute - Nonprofit Explorer - ProPublicaprojects.propublica.org
42.Misha Yagudin and Ozzie Gooen Discuss LLMs and Effective Altruismquri.substack.com