← AI Safety Orgs

Epoch AI

Forecasting

Compute trends. AI timelines.

Founded
2022
HQ
San Francisco, CA (remote-first)
Team
21
Structure
501(c)(3) nonprofit
Model
Mixed

Theory of Change

Epoch AI positions itself as the empirical foundation for AI policy and strategy decisions. Jaime Sevilla draws an explicit parallel to Nobel laureate William Nordhaus's climate science work: "We want to do something similar for artificial intelligence to what William Nordhaus did for climate change. He set the basis for rigorous study and thoughtful action guided by evidence."

The causal chain: collect data on AI trends -> analyze and publish findings -> inform policymakers, researchers, and the public -> better decisions about AI. The org is deliberately neutral on whether AI progress is good or bad. Sevilla (June 2025): "Our staff, and the AI community more broadly, is split on whether advancing AI will ultimately benefit society. As an organization, we are decidedly neutral on this question."

The founding mission (2022) was more directly safety-oriented: "clarify when and how transformative AI capabilities will be developed," working in close collaboration with Open Philanthropy and Rethink Priorities. The current framing emphasizes broad societal benefit rather than existential risk reduction specifically. All funding from CG/Open Phil is classified under "potential risks from advanced artificial intelligence."

What They Do

Core data products. The compute trends database tracks training compute for 120+ significant ML models since 1950 -- widely acknowledged as the most comprehensive such dataset. The founding paper "Compute Trends Across Three Eras of Machine Learning" (Feb 2022) corrected an earlier OpenAI estimate, finding compute doubles every ~6 months rather than 3.4 months. Additional databases track ML hardware, GPU clusters, frontier data centers, and AI companies.

Key research. "Will We Run Out of Data?" (2022, updated 2024) estimated text data exhaustion between 2026-2032, identifying a bottleneck years before it became mainstream. "Can AI Scaling Continue Through 2030?" (commissioned by Google DeepMind) concluded bottlenecks are surmountable. "Three Challenges Facing Compute-Based AI Policies" (co-authored with Google DeepMind) argued that compute as a regulatory proxy is breaking down. The Epoch Capabilities Index (ECI) identified an acceleration in AI capabilities around April 2024. The GATE macroeconomic model analyzes conditions for explosive economic growth from AI automation.

Benchmarking. FrontierMath: 300 research-level math problems evaluating frontier AI reasoning. Currently the most prominent private math benchmark, though structurally compromised by OpenAI's exclusive ownership and access (see Money section). Tier 4 completed in 2025. Also developing a software engineering benchmark with METR and a benchmark on open math problems with Schmidt Sciences.

Policy engagement. Briefings to U.S. Congress, UK AI directorate, Capitol Hill. Consultations with EU AI Office, UK DSIT. Epoch data informed compute thresholds in the EU AI Act and Biden Executive Order.

2025 output. 38 data insights, 14 reports, 40 newsletter issues, 5 podcasts; 987K website users; $5M spent, 21 staff.

Key People

Jaime Sevilla, Director. Spanish (~30), math + CS background, paused Aberdeen PhD to found Epoch. Candid about uncertainty, intellectually honest. His quip after co-founder launched a capabilities startup: "Yay just what I wanted for my bday: a comms crisis."

Notable departures form a striking pattern. Tamay Besiroglu (co-founder, associate director) left April 2025 to found Mechanize, a startup aimed at "full automation of the economy," backed by Nat Friedman, Patrick Collison, Jeff Dean. Took researchers Ege Erdil and Matthew Barnett. Marius Hobbhahn (co-founder) left April 2023 to found Apollo Research (AI safety evaluations). Lennart Heim (co-founder) left for RAND/GovAI (compute governance). Elliot Glazer (lead mathematician) left to join Principia Labs. Four of seven co-founders have departed, three to capabilities-adjacent or commercial work.

Team. 21 full-time staff as of 2025, remote-first, globally distributed.

Money and Incentives

Total known funding: ~$23.8M (2022-2025). Roughly 91% from a single source:

Source Amount % of Known Total
Coefficient Giving / Open Phil $21,773,611 ~91%
Jaan Tallinn $600,000 ~2.5%
Likith Govindaiah $400,000 ~1.7%
Leopold Aschenbrenner (via Manifund) $200,000 ~0.8%
Sentinel Bio $165,000 ~0.7%
Carl Shulman $100,000 ~0.4%
Paid engagements (est.) Unknown Unknown

Annual spend. ~$5M in 2025 for 21 staff. Fundraising target: $3M more, could deploy $10M.

Business model. Primarily philanthropic grants (~80%+ from CG/OP), supplemented by paid consultations with AI labs, governments, and investors. Clients include OpenAI, Google DeepMind, xAI, METR, UK DSIT, Sequoia Capital, Bridgewater, EU AI Office, Anthropic (small pilot).

FrontierMath financial structure. OpenAI commissioned and funded 300 benchmark problems. OpenAI retains ownership and has access to all problem statements and solutions (except 50-problem holdout where only statements are shared). Epoch cannot share problems with other AI companies without OpenAI's written permission. Only a "verbal agreement" that OpenAI won't train on the data.

Semiconductor stock investments. Epoch invests reserves in semiconductor/AI stocks. They frame this as a hedge -- "any gains from such investments increase our capacity to advance our mission in scenarios when our work matters most." This creates a financial interest in AI sector growth for an organization studying AI trajectory.

Board-funder overlap. Two of four board members (Tom Davidson, Ajeya Cotra) are from Coefficient Giving/Open Philanthropy, the source of 91% of funding. No independent board members exist -- the other two are the director (Sevilla) and the CFO/Secretary/Head of Ops (de la Lama, who holds board + executive roles simultaneously).

Key incentive questions. (1) Does extreme funder concentration constrain what Epoch can research or publish? (2) Does revenue from AI labs create pressure to avoid conclusions that displease lab clients? (3) Do semiconductor stock investments create pressure toward bullish AI progress findings? (4) Could Epoch survive if CG/OP withdrew support? There is evidence of editorial independence: Epoch published FrontierMath evaluation results contradicting OpenAI's claimed scores (11% vs. 32% for o3-mini). But structural incentives are structural regardless of current behavior.

What Others Say

The Mechanize critique. Oliver Habryka (LessWrong founder): "This seems like approximate confirmation that Epoch research was directly feeding into frontier capability work." Anthony Aguirre (FLI): "Huge respect for the founders' work at Epoch, but sad to see this." Holly Elmore (PauseAI): "This is the opposite of keeping the world safe from powerful AI! You are a traitor."

Paul Romer on FrontierMath (Nobel laureate, Jan 2025): Argues Elliot Glazer "does not understand the world in which he now works" -- that the holdout set is "an obvious sham" because OpenAI owns all problem statements, and that "monopolistic impunity is corroding the social norms of integrity." A structural argument about how tech industry incentives differ from academic norms.

Methodological critique. Steven Byrnes (LW, 114 karma): argues Epoch's headline ~3x/year algorithmic progress number is misleading, conflating distinct types of improvements. Real "stereotypical learning algorithm improvements" may be <10x over the entire 2018-2025 period (~30%/year), far less than the exponential suggested. Three other independent estimates arrive at similar ~3x numbers, but Byrnes argues they're measuring different things.

Policy credibility. "Epoch is highly trusted by all camps. At least in DC federal policy circles, I've heard people say 'I like what Epoch wrote'... You're not seen as having 'motivated reasoning.'" -- Golden Gate Institute for AI. "Epoch is probably the single organization I cite the most in my writing." -- Institute for Progress. "A key crux for CG's AI grantmaking strategy is AI timelines. Epoch reports have been a frequent source of parameter value estimates." -- Coefficient Giving managing director. "We used the Epoch Capabilities Index to determine what the recent rate of software progress has been." -- AI Futures Project.

What's Absent

No 990 financial filings yet (too new as independent nonprofit; first filing expected late 2025/early 2026). No independent board members -- all four have financial relationships with Epoch or its primary funder. No published conflict of interest or recusal procedures. No external audit of the core compute database methodology. No published forecasting track record assessing accuracy of past predictions. No breakdown of revenue from commercial vs. philanthropic sources. No documented policy for managing the tension between semiconductor stock investments and research neutrality.

Recommended Reading

  1. AXRP Episode 37: Jaime Sevilla on Forecasting AI (Oct 2024) -- the most technically candid interview. Sevilla explains methodology, admits its limitations, discusses personal uncertainty on x-risk. Start here. https://axrp.net/episode/2024/10/04/episode-37-jaime-sevilla-forecasting-ai.html

  2. "A Harsh Lesson About the World of Tech" by Paul Romer (Jan 2025) -- Nobel laureate's structural critique of FrontierMath and tech industry integrity. The strongest case that Epoch was naive about incentives. https://paulromer.net/harsh-lessons-about-tech/

  3. "What is Epoch?" by Jaime Sevilla (June 2025) -- the definitive mission statement, addressing departures, neutrality, and the Mechanize crisis. https://epoch.ai/blog/what-is-epoch

  4. "Is it 3 years, or 3 decades away?" AGI Timelines Podcast (March 2025) -- Erdil vs. Barnett debate shows genuine internal intellectual diversity. Both later departed to Mechanize. https://epoch.ai/epoch-after-hours/disagreements-on-agi-timelines

  5. "The nature of LLM algorithmic progress (v2)" by Steven Byrnes (LessWrong) -- the most serious methodological challenge to Epoch's core claims about algorithmic efficiency gains. https://www.lesswrong.com/posts/sGNFtWbXiLJg2hLzK

Show Claude’s analysis
An opinionated read. Read the brief first to form your own view.

Stated Theory of Change

Epoch's stated theory of change is that better information about AI trends leads to better decisions by all stakeholders. They aim to be the "William Nordhaus of AI" -- creating the empirical foundation that informs policy, investment, and research regardless of one's priors about whether AI is beneficial or dangerous.

The specific mechanism: collect and curate data on AI model training compute, hardware trends, algorithmic efficiency, and data availability. Analyze these trends to produce forecasts and identify bottlenecks. Publish findings openly. Engage with policymakers, labs, and the public to ensure decisions are "informed by the best possible evidence."

This is a deliberately neutral theory of change. Epoch does not advocate for specific policies or take a position on AI progress. Their bet is that the world is better off with high-quality public information about AI, regardless of which direction decisions ultimately go.

Revealed Theory of Change

Epoch's actions mostly align with their stated mission, but with notable tensions:

Alignment. They genuinely produce the best public data on AI compute trends. Their databases are cited by governments, labs, and researchers across ideological lines. They publish findings that sometimes contradict their clients (the o3-mini FrontierMath discrepancy). Their "Three Challenges" paper documented limitations of compute-based regulation that Epoch itself helped establish.

Divergences.

  1. The FrontierMath episode revealed a willingness to subordinate transparency to client relationships. Epoch agreed to NDAs that prevented disclosure to their own contributors. The "corrective actions" came only after public scandal. This suggests the neutrality commitment can be overridden by financial considerations.

  2. The pattern of departures suggests Epoch functions partly as a capabilities pipeline. Three co-founders left: one to a capabilities startup (Mechanize), one to AI safety evaluations (Apollo), one to compute governance policy (RAND). The Mechanize departure is the most revealing -- Besiroglu used Epoch's research on automation economics to build a company literally aimed at "full automation of all work."

  3. Revenue diversification toward AI labs creates subtle dependencies. Paid clients include OpenAI, Google DeepMind, xAI, Anthropic. While Epoch aims to charge "industry consultant rates," the growing commercial relationships create incentives to produce work that labs find useful rather than work that might embarrass them.

  4. Investing reserves in semiconductor/AI stocks creates a financial interest in AI progress for an organization that claims to be neutral about whether AI progress is beneficial.

Key Assumptions

Assumption 1: Better public information about AI leads to better decisions.

  • Evidence for: Epoch data informed compute thresholds in EU AI Act and Biden EO. Policymakers across ideological lines cite their work. CG uses their estimates to calibrate timelines and funding allocation.
  • Evidence against: Information is necessary but not sufficient. Having good data about AI progress doesn't prevent misuse if institutions are captured or incentives are misaligned. Also, good information about AI capabilities can help capabilities developers as much as safety researchers.
  • Testable: Yes -- do policies informed by Epoch data turn out to be well-calibrated?
  • If wrong: Epoch becomes a useful public good but doesn't meaningfully reduce AI risk.

Assumption 2: Organizational neutrality is possible and desirable.

  • Evidence for: Epoch is genuinely trusted by multiple camps. Policy credibility comes partly from not having an agenda.
  • Evidence against: Neutrality on existential risk is itself a position. All funding comes from AI safety philanthropy that explicitly frames Epoch as risk reduction. Being neutral while funded by safety-oriented donors could mean producing work that happens to be useful for safety without ever taking hard positions that might be needed.
  • Testable: Would Epoch publish findings strongly suggesting AI progress should be halted or dramatically slowed? We don't know.
  • If wrong: Epoch provides a veneer of scientific objectivity to a debate where the outcome depends on advocacy, not just facts.

Assumption 3: Compute trends are the right thing to track for understanding AI trajectory.

  • Evidence for: Compute has been the best predictor of AI capabilities. Epoch's own research shows training compute growing ~4-5x/year with strong predictive power.
  • Evidence against: Their own "Three Challenges" paper argues compute as a proxy is breaking down. Reasoning training, distillation, and deployment scaffolding increasingly drive capabilities beyond what compute alone predicts. Steven Byrnes's critique suggests algorithmic progress may be much less than Epoch's headline estimates.
  • Testable: Yes -- compare Epoch forecasts against reality over time.
  • If wrong: Epoch's core methodology becomes obsolete, though they're already pivoting toward benchmarking and capability indices.

Assumption 4: Independence is maintained despite extreme funder concentration.

  • Evidence for: Published FrontierMath results contradicting OpenAI's claims. Published "Three Challenges" paper questioning their own compute governance framework. Internal diversity of views (Erdil's bearish timelines vs. Barnett's bullish ones).
  • Evidence against: Board is 50% funder-controlled. No independent directors. No published conflict of interest procedures. FrontierMath NDAs show willingness to compromise transparency for client relationships.
  • Testable: Would Epoch publish findings that directly threatened CG/OP funding or lab client relationships?
  • If wrong: Epoch's research subtly self-censors on questions that might displease funders.

Strengths

Unique positioning. No other organization occupies Epoch's niche. There is simply no comparable public database of AI compute trends, and no other nonprofit doing this kind of quantitative AI forecasting with this level of rigor. This makes them irreplaceable.

Credibility across divides. Being trusted by governments, AI labs, safety researchers, and policy institutes simultaneously is extremely rare. This credibility is their most valuable asset.

Research quality and output. 21 people producing 38 data insights, 14 papers, 40 newsletters per year is remarkable productivity. The research is technically sound, openly published, and widely cited.

Intellectual honesty. Sevilla's candid admissions of uncertainty ("I'm just very confused" about x-risk), willingness to correct past estimates (data limits paper revision), and publication of findings contradicting clients (o3-mini FrontierMath scores) all suggest genuine intellectual integrity at the leadership level.

Policy impact. Epoch data has materially shaped the most significant AI regulations to date (EU AI Act, Biden EO). This is concrete, measurable impact.

Weaknesses and Risks

Funder concentration. 91% of known funding from a single source (CG/OP) is extreme dependency. Two of four board members from that funder. The organization's survival depends on maintaining this relationship.

Governance gaps. No independent board members, no published conflict of interest procedures, dual-hatting of board + executive roles (de la Lama). These are standard nonprofit governance failures that become more concerning given the funder concentration and lab client relationships.

Capabilities pipeline risk. The Mechanize departure is not an isolated incident but part of a pattern. Four of seven co-founders left, including three to capabilities-adjacent work. Epoch's "neutral" research environment may be training and socializing researchers who subsequently apply their understanding of AI scaling to accelerate capabilities.

FrontierMath as cautionary tale. The disclosure failure revealed systemic issues: willingness to accept NDAs that prevented transparency, a "verbal agreement" as the only safeguard against training data contamination, structural ownership giving one company exclusive access. These are governance failures, not mere communication mistakes.

Methodological vulnerability. The core algorithmic progress claim (~3x/year) faces substantive challenge from Byrnes and others. If this number is significantly wrong, it undermines Epoch's most-cited research. The compute database, while unique, involves significant estimation and has never been externally audited.

Semiconductor stock investments. Creating a financial interest in AI sector growth while claiming research neutrality on AI progress is a structural conflict. Even if it doesn't actually influence research, the perception alone damages credibility.

Cross-References

Complementary to: METR (model evaluations -- Epoch is developing a benchmark with METR), GovAI/Centre for the Governance of AI (compute governance policy that builds on Epoch data), AI Safety Institute/UK DSIT (government capacity that uses Epoch analysis).

Revenue relationship with: OpenAI, Google DeepMind, xAI, Anthropic (paid clients). These relationships provide useful information flow but create subtle dependency.

Talent pipeline to: Mechanize (Besiroglu, Erdil, Barnett), Apollo Research (Hobbhahn), RAND/GovAI (Heim), Principia Labs (Glazer). Epoch has been remarkably productive at spawning new organizations, though the direction is mixed from a safety perspective.

Funded by same source as: Many AI safety organizations (CG/OP funds a large fraction of the field). This creates a correlation risk -- if CG/OP changes strategy, many orgs are affected simultaneously.

What Would Change This Assessment

Positive updates:

  • Independent board members appointed with no CG/OP or lab affiliation
  • External audit of compute database methodology published by credible third party
  • Published forecasting track record showing calibration quality
  • Revenue diversification reducing CG/OP to <50% of total
  • Published conflict of interest and recusal procedures
  • Evidence that Epoch published findings clearly uncomfortable for funders or clients

Negative updates:

  • Evidence that research conclusions were influenced by funder or client preferences
  • Further departures to capabilities-focused organizations
  • Failure to implement stated corrective actions on benchmark transparency
  • Continued expansion of lab client relationships without proportional governance strengthening
  • Suppression or delay of findings embarrassing to OpenAI or other clients

Self-Critique

Weakest claim: My assessment of the semiconductor stock investment conflict may overstate the risk. The amounts involved are likely small relative to total assets, and the framing as a "hedge" is logically coherent even if optically concerning. I may be pattern-matching to financial conflicts of interest in ways that don't map well to a small nonprofit.

Potential biases: I may be too sympathetic to Epoch's mission because their work is genuinely useful and their intellectual honesty is refreshing. The FrontierMath episode could be more damning than I'm framing it -- it might reveal a systematic willingness to prioritize revenue over transparency that hasn't been tested in other domains yet.

Missing perspective: I haven't engaged deeply with the view that Epoch's work is net negative for safety because it helps capabilities developers more than safety researchers. Habryka's critique is the closest to this, but a more developed version would argue that quantifying AI progress precisely makes it easier for labs to allocate resources efficiently to capability gains.

What a thoughtful disagreer would say: "Epoch is the ideal laundering mechanism for AI labs. Labs fund neutral research through philanthropy, the research is genuinely high-quality and trusted, and then the same labs use that research to build more powerful AI. The whole theory of change -- that better information leads to better decisions -- ignores the fact that labs have better information regardless and that public information mostly helps them recruit, fundraise, and shape policy in their favor. Epoch's neutrality is not a feature, it's a bug."

Information that would most change my view: A 990 filing showing that commercial revenue from AI labs is a much larger share of total revenue than I've estimated, or evidence that a research finding was delayed or softened due to client or funder pressure.

Connected to (14)

Stanford HAIcollaborator
Coefficient Givingboard overlap · Tom Davidson
Coefficient Givingboard overlap · Ajeya Cotra
Google DeepMindcollaborator
Institute for Progressadvisor at · Miles Brundage
Mechanizestaff to · Tamay Besiroglu
Mechanizestaff to · Ege Erdil
Mechanizestaff to · Matthew Barnett
METRcollaborator
Principia Labsstaff to · Elliot Glazer
Rethink Prioritiesspun off from
OpenAIcollaborator
Apollo Researchstaff to · Marius Hobbhahn
Centre for the Governance of AIstaff to · Lennart Heim
Sources (54)
Every URL that was read during research.
  1. 1.About Usepoch.ai
  2. 2.Our Teamepoch.ai
  3. 3.The Researcher Trying to Glimpse the Future of AItime.com
  4. 4.AI benchmarking organization criticized for waiting to disclose funding from OpenAI | TechCrunchtechcrunch.com
  5. 5.'Manipulative and disgraceful': OpenAI's critics seize on math benchmarking scandal | Fortunefortune.com
  6. 6.A tech founder is getting skewered online after announcing his startup aims to replace all human workers with AI, calling it a ‘full automation of all work’ | Fortunefortune.com
  7. 7.Jaime Sevilla on Trends in Machine Learninghearthisidea.com
  8. 8.37 - Jaime Sevilla on Forecasting AIaxrp.net
  9. 9.Tamay Besiroglu on Explosive Growth from AIhearthisidea.com
  10. 10.Will we run out of data to train large language models?epoch.ai
  11. 11.Trends in Artificial Intelligenceepoch.ai
  12. 12.The FrontierMath scandalsiliconreckoner.substack.com
  13. 13.Three challenges facing compute-based AI policiesepoch.ai
  14. 14.How much does it cost to train frontier AI models?epoch.ai
  15. 15.Famed AI researcher launches controversial startup to replace all human workers everywhere | TechCrunchtechcrunch.com
  16. 16.A Harsh Lesson About the World of Techpaulromer.net
  17. 17.Data on the Trajectory of AI | Epoch AI Databaseepoch.ai
  18. 18.Epoch Artificial Intelligence Inc - 501C3 Nonprofit - San Francisco, CAtaxexemptworld.com
  19. 19.Announcing Epoch AI: A research initiative investigating the road to transformative AIepoch.ai
  20. 20.Epoch AI 2025 impact reportepoch.ai
  21. 21.Epoch AI 2022 impact reportepoch.ai
  22. 22.The Direct Approachepoch.ai
  23. 23.Interview with Elliot Glazer, Lead Mathematician at Epoch AIlemmata.substack.com
  24. 24.Lennart Heim on the compute governance era and what has to come after | 80,000 Hours80000hours.org
  25. 25.Transparencyepoch.ai
  26. 26.What is Epoch?epoch.ai
  27. 27.The case for multi-decade AI timelinesepoch.ai
  28. 28.What I Wish I Knew About FrontierMathlemmata.substack.com
  29. 29.OpenAI Secretly Funded Benchmarking Dataset Linked To o3 Modelsearchenginejournal.com
  30. 30.Can AI scaling continue through 2030?epoch.ai
  31. 31.How well did forecasters predict 2025 AI progress?epoch.ai
  32. 32.Explosive growth from AI: A review of the argumentsepoch.ai
  33. 33.Transparencyepoch.ai
  34. 34.Chinchilla scaling: A replication attemptepoch.ai
  35. 35.Most AI value will come from broad automation, not from R&Depoch.ai
  36. 36.Jaime Sevillajsevillamol.github.io
  37. 37.Clarifying the creation and use of the FrontierMath benchmarkepoch.ai
  38. 38.GATE Model Playgroundepoch.ai
  39. 39.What will AI look like in 2030?epoch.ai
  40. 40.Compute trends across three eras of machine learningepoch.ai
  41. 41.Work with our expertsepoch.ai
  42. 42.Careersepoch.ai
  43. 43.OpenAI’s dirty December o3 demo doesn’t readily replicategarymarcus.substack.com
  44. 44.Latestepoch.ai
  45. 45.Organisationsjobs.80000hours.org
  46. 46.Jaime Sevilla - LCFIlcfi.ac.uk
  47. 47.Epoch AIgithub.com
  48. 48.Jaime Sevilla - CSERcser.ac.uk
  49. 49.Literature review of transformative artificial intelligence timelinesepoch.ai
  50. 50.Is it 3 years, or 3 decades away? Disagreements on AGI timelinesepoch.ai
  51. 51.Forecasting AI progress until 2040epoch.ai
  52. 52.AI capabilities progress has sped upepoch.ai
  53. 53.Data on AI Capabilities and Benchmarkingepoch.ai
  54. 54.Data on GPU clustersepoch.ai