← AI Safety Orgs

Timaeus

Research

Singular Learning Theory. Mathematical approach.

Founded
2023
HQ
Berkeley, CA
Team
20
Structure
501(c)(3) nonprofit
Model
Grants

Theory of Change

Timaeus claims that Singular Learning Theory (SLT) -- a mathematical theory of Bayesian statistics using algebraic geometry -- provides fundamental tools for understanding how training data shapes model structure. The causal chain: (1) training data determines the geometry of the loss landscape, (2) loss landscape geometry determines which algorithms models learn, (3) understanding this chain enables new tools for interpretability ("reading" what models have learned) and alignment ("writing" desired properties into models more reliably).

Jesse Hoogland: "Currently all of our existing techniques for trying to align models look like this: Train the model on examples of the kind of behavior you would like to see. It's a very indirect process... We don't understand how it works and we don't know that the way it's actually changing models is deep or significant or robust or lasting."

Daniel Murfet frames the alignment relevance through Nate Soares's "capabilities generalize further than alignment" argument: capabilities are deeply inscribed by patterns pervasive in training data, while alignment interventions (RLHF, constitutional AI) may produce shallower structures that break under distribution shift. SLT's contribution would be tools to measure which structures are "deep" vs "shallow."

What They Do

Timaeus conducts fundamental research on SLT applications to AI safety, organized into three prongs: (1) near-term safety applications (UK AISI partnership on singular psychometrics and weight exfiltration), (2) interpretability via developmental interpretability -- tracking how models change during training using SLT-derived observables, and (3) longer-term alignment foundations.

Key research outputs:

  • ICLR 2025 Spotlight: "Differentiation and Specialization of Attention Heads" -- used refined LLC to track how attention heads develop distinct roles during training
  • Best Paper HiLD ICML 2024: "Loss Landscape Degeneracy and Stagewise Development in Transformers" -- demonstrated that small transformers progress through distinct developmental stages, detectable by the Local Learning Coefficient even when hidden from the loss curve
  • AISTATS 2025: "The Local Learning Coefficient" -- foundational method paper for estimating the key geometric invariant
  • Scaled techniques from toy models to billions of parameters, finding that sampling scales sublinearly with model size
  • Discovered the "multigram circuit" -- a new circuit type for nested pattern matching, found using SLT tools

Open-source devinterp library (127 GitHub stars). 20+ talks at frontier labs and academic venues. Organized 6+ major conferences/workshops since 2023. MATS Summer 2026 mentoring stream.

Key People

Jesse Hoogland (Executive Director, co-founder): MSc Theoretical Physics (U Amsterdam), MATS 3.0/3.1 under Evan Hubinger, prior health-tech CTO. Responsible for outreach, operations, and research engineering. 20+ talks at OpenAI, DeepMind, Anthropic and other major venues. Raised $3.5M+ in grants.

Daniel Murfet (Research Director, co-founder): Algebraic geometer who left a tenured position at U Melbourne in early 2025 to join full-time. Established the SLT for AI safety research agenda. The core mathematical expertise behind Timaeus's approach. "A very robust argument supports the view that AGI will be dangerous."

Team: Grew from 4 at founding (Oct 2023) to ~20+ by March 2026 (16 FTE by Jul 2025). Includes ~10 researchers, 5 engineers. Three advisors: Evan Hubinger (Anthropic), Davidad (ARIA), Adam Gleave (FAR AI). No public departures.

Money and Incentives

Total funding: ~$3.5M+ raised cumulatively through July 2025. 2025 budget: ~$2.5M.

Funding breakdown:

  • Open Philanthropy: $1,557,000 (Jan 2025, ~44% of cumulative total) -- two grants for operating expenses
  • Survival and Flourishing Fund: ~$1.05M across all rounds (estimated: ~$500K pre-2025, $276K S-Process 2025, $206K matching pledge, $70K speculation grant)
  • Manifund: ~$255K (initial $143K regrant + ~$112K additional)
  • LTFF, AISTOF: mentioned but amounts unknown
  • Jesse Hoogland: personal FTX Future Fund grant (amount unknown, for career transition)

Business model: 100% grants. No product revenue, no compute credits from labs, no venture investment.

Salaries: $70K-$140K/year full-time; $50-100/hour contract. Below-market for AI safety research in early period, likely improved after Open Phil grant.

Incentive structure: Pure nonprofit with no commercial pressures. No lab affiliations beyond the advisory relationships (Hubinger at Anthropic, Gleave at FAR AI). UK AISI partnership is the only documented institutional relationship. The absence of lab compute dependencies or revenue is a structural advantage for independence.

Potential concerns: Open Phil is now the single largest funder. Advisor-funder overlap (Hubinger as both advisor and first major funder). No board of directors separate from leadership. No public financial filings (EIN unknown).

What Others Say

Strongest technical critique -- Joar Skalse (Nov 2023): SLT cannot explain generalization because it abstracts away the parameter-function map and Bayesian prior. "To understand the generalisation behaviour of a learning machine, we must understand its inductive bias. SLT abstracts away from both the prior and the parameter-function map. Hence, SLT is at its core unable to explain generalisation." Skalse concedes SLT may help with phase transitions and grokking, but denies it can serve as a unified theory of deep learning.

Even the advisor is skeptical -- Evan Hubinger (Timaeus advisor): "My modal outcome is that it's mostly just wrong and doesn't explain machine learning inductive biases that well. Inductive biases are very complex and most theories like this in the past have failed." Also: "I think Singular Learning Theory is a real contender for a theory that has a chance of effectively explaining and predicting the mechanisms behind machine learning inductive biases."

Constructive critique -- Beren Millidge (Dec 2025): SLT's insights may be more general than SLT proves. The real action may be in "pseudo-singularities" from SGD noise and finite training, not true mathematical singularities. SLT practitioners "should study the noisy stochastic optimization regime much more closely."

Community endorsement -- Zvi Mowshowitz (2025): Rates Timaeus "High" confidence with "High" funding need. "Slam dunk for interpretability funding." Notes excellent advisors and that "Evan, John Wentworth and Vanessa Kosoy have offered high praise, and there is evidence they have impacted top lab research agendas."

Theoretical gap acknowledged by founders -- Murfet: the bridge between Bayesian statistics (where SLT theorems are proven) and SGD training (how models are actually trained) is "not a theoretically justified step at this point in some rigorous sense." The empirical results are encouraging but the theoretical bridge is incomplete.

What's Absent

  • No 990 financial filings available; EIN unknown. Cannot verify financial details independently.
  • No board of directors separate from leadership team. Governance is opaque for a $2.5M+ org.
  • No specific documented examples of frontier labs adopting SLT techniques, despite talks at all major labs. The claimed "impact on lab research agendas" is unverifiable.
  • No published results from UK AISI partnership projects (due April 2025 per Feb 2025 update). Status unknown as of March 2026.
  • No formal published response to Skalse's generalization critique.
  • No documented institutional safeguards for capabilities risk, despite promising them in Oct 2023.
  • Alexander Gietelink Oldenziel's (co-founder) current status at Timaeus is unclear.

Recommended Reading

  1. AXRP Episode 31: Singular Learning Theory with Daniel Murfet (https://axrp.net/episode/2024/05/07/episode-31-singular-learning-theory-dan-murfet.html) -- 21K-word deep dive where Murfet explains SLT's foundations, honestly acknowledges theoretical gaps, and articulates alignment relevance. The most technically candid source available.

  2. Joar Skalse: "My Criticism of Singular Learning Theory" (https://www.lesswrong.com/posts/ALJYj4PpkqyseL7kZ/my-criticism-of-singular-learning-theory) -- The strongest technical critique. Essential counterpoint to the SLT enthusiasm.

  3. Hyper-Exponential: "Safe AI with Singular Learning Theory" (https://www.hyper-exponential.com/p/safe-ai-with-singular-learning-theory) -- Jesse tells the founding story and explains SLT in plain language. Best accessible introduction to what Timaeus is actually doing and why.

  4. "Timaeus in 2024" (https://www.lesswrong.com/posts/gGAXSfQaiGBCwBJH5/timaeus-in-2024) -- Most recent organizational update covering research progress, UK AISI partnership, and strategic direction.

  5. Cognitive Revolution: "Embryology of AI" (https://www.cognitiverevolution.ai/embryology-of-ai-how-training-data-shapes-ai-development-w-timaeus-jesse-hoogland-daniel-murfet/) -- Joint interview where Jesse and Daniel explain the developmental biology analogy and contrast with SAE-based interpretability.

Show Claude’s analysis
An opinionated read. Read the brief first to form your own view.

Stated Theory of Change

Timaeus claims that Singular Learning Theory provides the mathematical foundation for understanding how training data determines model behavior through loss landscape geometry. The stated causal chain:

  1. Training data creates statistical patterns
  2. These patterns determine the geometry of the loss landscape (singularities, degeneracies)
  3. The geometry determines which algorithms SGD/Bayesian learning prefer (via the Local Learning Coefficient as a measure of "complexity")
  4. Understanding this chain enables (a) interpretability tools to detect when models learn dangerous structures, (b) alignment techniques that reliably write desired properties, and (c) evaluation methods that distinguish memorization from generalization

The mechanism through which this reduces AI risk: if alignment techniques work by engineering training data (RLHF, constitutional AI, DPO), then understanding the data-to-structure link tells you whether those alignment interventions are "deep" or "shallow" -- whether they will hold under distribution shift. SLT aims to provide that understanding.

The organizational path to impact: publish compelling research, give talks at frontier labs, build credibility, and eventually hand off validated techniques to labs and policymakers. Redwood Research's influence on lab RSPs is the explicit model.

Revealed Theory of Change

What Timaeus's actions suggest matches the stated theory well, with some interesting nuances:

Consistent with stated theory:

  • Research output is genuine and growing in quality (ICLR Spotlight, AISTATS, best workshop paper)
  • The scaling work (toy models to billions of parameters) directly serves the stated goal of practical tools
  • UK AISI partnership demonstrates willingness to work on near-term applications, not just theory
  • Conference organizing and MATS mentoring build the SLT research community needed for long-term impact
  • Below-market salaries in the early period suggest genuine mission alignment

Nuances and potential divergences:

  • The emphasis on mathematical beauty and theoretical depth is genuine, not just instrumental. Murfet's motivation is partly "the study of learning machines is as deep as physics" -- the intellectual appeal of the mathematics is a real driver alongside the safety mission.
  • The research portfolio has diversified considerably beyond the original DevInterp proposal (susceptibilities, data attribution, compression, loss kernel). This could be productive exploration or scope creep.
  • The "alignment" prong (the "writing" side -- using SLT to improve alignment techniques) remains aspirational. No published results on using SLT to actually improve RLHF or other alignment techniques. The near-term work is all interpretability and evaluation.

Key Assumptions

1. SLT's Bayesian framework meaningfully describes SGD-trained neural networks

  • Evidence for: empirical correspondence between SLT predictions and observed behavior in toy models, deep linear networks, and small transformers. LLC changes track developmental stages even when hidden from loss curves.
  • Evidence against: the theoretical bridge is incomplete. Murfet: "not a theoretically justified step at this point in some rigorous sense." Beren Millidge argues the real action may be in "pseudo-singularities" from SGD noise rather than true mathematical singularities.
  • Testable: yes -- continued scaling to larger models will either maintain or lose the correspondence.
  • If wrong: the core framework is undermined, but the empirical tools (LLC estimation) might still be useful as heuristics even without the theoretical foundation.

2. Phase transitions are the right organizing principle for interpretability

  • Evidence for: clear developmental stages in small transformers and deep linear networks. Discovery of the multigram circuit using SLT tools.
  • Evidence against: Skalse's critique that SLT cannot explain generalization. The "too infrequent" failure mode (only large-scale structures form in phase transitions, everything else is gradual). In large models, transitions may be so localized that they're hard to detect.
  • Testable: yes -- the MATS stream and continued scaling work will test this.
  • If wrong: the interpretability tools may still provide useful signals without the phase transition framing being strictly correct.

3. SLT-derived tools will scale to frontier models

  • Evidence for: the scaling from toy to billions of parameters was achieved with sublinear sampling costs. "If you can train a model, you can sample from the local posterior."
  • Evidence against: accuracy of LLC estimates at scale is unverified. Murfet: "we don't know that the absolute values are correct when we do that, and most likely they're not, but I think we believe in the changes reflecting something real."
  • Testable: yes -- this is the current engineering push.
  • If wrong: Timaeus would need to find alternative tools or restrict to smaller models.

4. Frontier labs will adopt SLT techniques if they prove useful

  • Evidence for: 20+ talks at labs, praise from lab researchers (Hubinger, Wentworth, Kosoy). Redwood Research precedent.
  • Evidence against: no documented adoption despite 2+ years of outreach. Labs have strong internal interpretability teams (Anthropic, DeepMind).
  • Testable: yes -- verifiable by whether labs cite or use SLT tools in their published work.
  • If wrong: Timaeus's impact would be limited to academic contributions and community building.

Strengths

  1. Intellectually honest leadership. Murfet's AXRP interview is remarkably candid about what SLT can and cannot do, what the theoretical gaps are, and where the speculative leaps are. Jesse acknowledges skepticism thoughtfully. This honesty is a reliable indicator of research integrity.

  2. Genuine mathematical depth. Murfet is one of very few people worldwide who deeply understand both algebraic geometry and its applications to deep learning. The SLT framework is not a rebranding of existing ideas -- it represents genuinely new mathematical content applied to AI.

  3. Rapid execution with limited resources. From $145K initial budget and 4 people to $2.5M budget, 16 FTE, and an ICLR Spotlight in under 2 years. The ratio of output to funding is impressive.

  4. No commercial incentive distortion. Pure nonprofit with no product revenue, no lab compute dependencies, no venture investment. This is structurally aligned with producing honest research rather than research that justifies a business model.

  5. Community building is a force multiplier. The conferences, MATS stream, Discord, and open-source library are building a research community around SLT that will outlast Timaeus itself. This is the right strategy for a theoretical research agenda.

  6. Diverse advisor network provides credibility. Hubinger (Anthropic), Davidad (ARIA UK), Gleave (FAR AI) span different approaches and institutions.

Weaknesses and Risks

  1. The theoretical bridge between Bayesian statistics and SGD is the load-bearing gap. If this bridge cannot be built (or if it turns out the Bayesian framework is fundamentally misleading about SGD dynamics), the theoretical foundation collapses. The empirical tools might survive but would lose their predictive grounding.

  2. No verified adoption by frontier labs. After 2+ years and 20+ lab talks, there is no public evidence that any lab has adopted SLT techniques in their safety pipeline. The claimed "impact on lab research agendas" is unverifiable.

  3. Skalse's generalization critique remains unanswered. The strongest published critique argues SLT cannot explain the thing it most needs to explain (why neural networks generalize), and Timaeus has not published a formal rebuttal. The comment thread discussion exists but is buried.

  4. The alignment applications are still aspirational. The "writing" side -- using SLT to actually improve alignment techniques -- has no published results. All concrete work is on interpretability and evaluation. If SLT never delivers alignment-improving tools, it becomes a fancy microscope rather than a solution.

  5. UK AISI project results are overdue. Promised by April 2025, no publications found as of March 2026. This is a concrete deliverable that would demonstrate practical utility.

  6. Governance is thin. No documented board of directors, no public conflict of interest policy, no financial filings. Common for young AI safety nonprofits but inappropriate for an org handling $2.5M+/year.

  7. Single-theory dependency. Timaeus's identity is SLT. If SLT turns out to be a mathematical curiosity that doesn't deliver practical safety tools, the org has no fallback position (unlike a lab that could pivot to other approaches).

Cross-References

  • Complementary to Anthropic's mechanistic interpretability: Timaeus explicitly positions SLT-based interpretability as "top-down" compared to SAEs' "bottom-up" approach. Daniel Murfet views them as likely to inform each other. Anthropic's Chris Olah has engaged with the ideas.
  • Aligned with ARIA/Davidad's agenda: Davidad is an advisor. His program at ARIA shares the emphasis on mathematical foundations for AI safety.
  • Related to ARC's approach: Both aim for mathematically grounded interpretability. Jesse's "Sweet Lesson" post places Timaeus alongside ARC-style interpretability.
  • Complements Redwood/Greenblatt control agenda: Timaeus's tools could strengthen evaluation methods used in AI control protocols.
  • Distinct from MIRI/agent foundations: Timaeus works within the deep learning paradigm rather than trying to develop alternative formal frameworks.

What Would Change This Assessment

Positive updates:

  • Published results from UK AISI partnership demonstrating practical utility
  • A frontier lab publicly citing SLT techniques in their safety work
  • Successful theoretical bridge between Bayesian posterior and SGD dynamics
  • LLC-based tools detecting a real safety-relevant property (e.g., deceptive planning) in a frontier-scale model

Negative updates:

  • Scaling to frontier models reveals LLC estimates are too noisy to be useful
  • A compelling theoretical result showing the Bayesian-SGD bridge is fundamentally blocked
  • Key researcher departures (especially Murfet)
  • Stagnation in publication quality or venue prestige after the initial upward trajectory
  • Failure to deliver on the MATS 2026 stream or other community commitments

Self-Critique

What sources should I have checked but didn't?

  • The original Watanabe textbooks on SLT would provide deeper mathematical context
  • The full comment thread on Skalse's critique (referenced but not fetched)
  • UK AISI's published reports on partnership projects
  • More recent 2026 publications that may exist on arxiv

Where is this analysis potentially biased?

  • I may be overly sympathetic to the mathematical elegance of SLT. The theory is intellectually appealing, which can create a halo effect.
  • The evidence base is heavily weighted toward Timaeus's own communications (blog posts, interviews) rather than independent assessments. Zvi's endorsement and Skalse's critique are the main independent voices.
  • I may underweight the difficulty of the Bayesian-SGD bridge because Timaeus's empirical results are encouraging. The theoretical gap could be more fundamental than current evidence suggests.

What would a thoughtful person who disagrees say? "SLT is beautiful mathematics applied to a simplified model of learning (Bayesian) that doesn't describe how real systems (SGD) actually work. The empirical correlations in toy models are interesting but could be coincidental at scale. Meanwhile, SAE-based interpretability and AI control are producing practical safety results NOW, without needing the theoretical foundations Timaeus is building. Timaeus is a research org that produces papers, not safety."

What's my single weakest claim? That SLT-derived tools will provide actionable safety insights for frontier-scale models. This is plausible but entirely undemonstrated. The scaling from toy to billions was a necessary step but not sufficient -- the tools need to actually find safety-relevant properties, not just developmental stages.

What information would most change my view? Results from the UK AISI partnership. If singular psychometrics can reliably distinguish memorization from generalization in a frontier model's eval performance, that would be the first concrete proof of safety-relevant utility. If the results show SLT tools don't provide actionable signal at scale, that would be a significant negative update.

Connected to (11)

Sydney Mathematical Research Institutecollaborator
MATScollaborator · Jesse Hoogland
Monash Universitycollaborator
PIBBSScollaborator
UK AI Safety Institutecollaborator
Anthropicadvisor at · Evan Hubinger
ARIAadvisor at · David Dalrymple
Catalyze Impactstaff from · Alexandra Bos
FAR AIadvisor at · Adam Gleave
University of Melbournecollaborator · Daniel Murfet
David Krueger Labstaff from · Jesse Hoogland
Sources (39)
Every URL that was read during research.
  1. 1.Timaeus | Breakthrough Scientific Progress on AI Safetytimaeus.co
  2. 2.Timaeus | From Theory to Practicetimaeus.co
  3. 3.Timaeus | Learn about SLTtimaeus.co
  4. 4.38.2 - Jesse Hoogland on Singular Learning Theoryaxrp.net
  5. 5.31 - Singular Learning Theory with Daniel Murfetaxrp.net
  6. 6.Embryology of AI: How Training Data Shapes AI Development w/ Timaeus' Jesse Hoogland & Daniel Murfetcognitiverevolution.ai
  7. 7.AI DOOM: Jesse Hoogland of Timaeus, Manifold episode 106stevehsu.substack.com
  8. 8.Manifold | AI DOOM: Jesse Hoogland of Timaeusmanifold1.com
  9. 9.Safe AI with Singular Learning Theory ..hyper-exponential.com
  10. 10.Hi, I'm Jessejessehoogland.com
  11. 11.Talksjessehoogland.com
  12. 12.My Criticism of Singular Learning Theorygreaterwrong.com
  13. 13.Announcing Timaeusgreaterwrong.com
  14. 14.Timaeus's First Four Monthsgreaterwrong.com
  15. 15.Timaeus in 2024greaterwrong.com
  16. 16.GitHub - timaeus-research/devinterp: Tools for studying developmental interpretability in neural networks.github.com
  17. 17.SFF-2025 S-Process Recommendations Announcement | Survival and Flourishing Fundsurvivalandflourishing.fund
  18. 18.Towards Developmental Interpretabilitygreaterwrong.com
  19. 19.Theoretical And Empirical Aspects Of Singular Learning Theory For AI Alignmentsimons.berkeley.edu
  20. 20.The Big Nonprofits Post 2025thezvi.wordpress.com
  21. 21.Initial Quick Thoughts on Singular Learning Theoryberen.io
  22. 22.My impression of singular learning theorygreaterwrong.com
  23. 23.Dialogue introduction to Singular Learning Theorygreaterwrong.com
  24. 24.ILIAD Conferenceiliadconference.com
  25. 25.Jesse Hooglandjessehoogland.com
  26. 26.SLT for AI Safetygreaterwrong.com
  27. 27.Jesse Hoogland on Developmental Interpretability and Singular Learning Theorytheinsideview.ai
  28. 28.Dan Murfet, Jesse Hoogland at MATS: Summer 2026matsprogram.org
  29. 29.The Sweet Lesson: AI Safety Should Scale With Computegreaterwrong.com
  30. 30.Alexander Strang & Alexander Gietelink Oldenziel | Advance AI Research Today — Pivotalpivotal-research.org
  31. 31.My hopes for alignment: Singular learning theory and whole brain emulationgreaterwrong.com
  32. 32.Will Tassilo Think Singular Learning Theory Isn’t Useful for Alignment by the End of 2024?manifold.markets
  33. 33.Developmental Interpretabilitydevinterp.com
  34. 34.Shallow review of technical AI safety, 2024greaterwrong.com
  35. 35.What makes a good "regrant"?manifund.substack.com
  36. 36.Neural networks generalize because of this one weird trickgreaterwrong.com
  37. 37.Loss Landscape Degeneracy and Stagewise Development in Transformersarxiv.org
  38. 38.The Local Learning Coefficient: A Singularity-Aware Complexity Measurearxiv.org
  39. 39.Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficientarxiv.org