EleutherAI

Research

Open-source AI research community.

Founded: 2020
HQ: Washington, DC
Team: 36
Structure: 501(c)(3) nonprofit
Model: Grants

Theory of Change

EleutherAI's theory of change has evolved significantly since its founding.

Original (2020-2022): Open-sourcing large language models enables safety research that would otherwise be locked behind corporate APIs. Co-founder Connor Leahy articulated three arguments: (1) important safety research requires access to model internals, not just API access; (2) withholding models from bad actors is futile because "the only secret was that it was possible"; (3) security through obscurity is not real security. The org explicitly stated it would never release a model beyond the frontier.

Current (2023-present): Independent nonprofit research on interpretability, alignment, and AI governance, independent of commercial incentives. Stella Biderman (Feb 2025): "We need there to be people who are experts in the development of this technology, who do research on it and promote access to it but who don't have a vested financial interest in its commercial success." The pivot was driven by other organizations (Meta, Mistral) taking over large-scale open model releases, freeing EleutherAI to pursue "the research we wanted to use these models for to begin with."

What They Do

Safety research. The most significant recent output is Deep Ignorance (Aug 2025, with UK AISI and Oxford): demonstrated that filtering pretraining data to remove dangerous knowledge creates tamper-resistant safeguards in open-weight models. Filtered models cannot regenerate dangerous knowledge even when fine-tuned on unsafe data -- the first credible defense against the "just fine-tune away the safety training" attack on open models. However, the paper honestly notes this does not prevent in-context retrieval via RAG. Best Paper Runner-up at NeurIPS 2025 workshop.

Interpretability research. Mechanistic Anomaly Detection (0.95 AUROC on toy deception-detection tasks, with honest reporting that it doesn't generalize to all models/tasks). ELK research (VINC-S, eliciting latent knowledge from model activations). LEACE (provable concept erasure). SAE feature analysis (including an important negative result showing SAEs don't learn consistent features across seeds). Automated interpretability for SAE features. 130+ publications in top ML venues (NeurIPS, ICML, ICLR, EMNLP, Nature, ACL).

Infrastructure. lm-evaluation-harness (11,900+ GitHub stars) is the backend for Hugging Face's Open LLM Leaderboard and used by NVIDIA, Cohere, Google, Microsoft. Before this, every lab ran evaluations differently, making meaningful comparison impossible. The Pythia model suite (28 models, identical data/order, 154 checkpoints) was specifically designed to enable scientific research on training dynamics. Common Pile v0.1 (8TB, licensed/public domain text) addresses copyright concerns from the original Pile. Models downloaded over 70 million times.

Policy. Joint letter with Mozilla and Hugging Face opposing SB 1047 (arguing it would harm open-source AI). EU AI Act advocacy for open-source exemptions. Stella Biderman provided Senate testimony on open-source AI (Sept 2023). EleutherAI positions itself as the policy voice for the open-source AI safety community.

Alignment. Alignment MineTest project (Curtis Huebner): studying corrigibility and embedded agency in RL environments. Weak-to-strong generalization experiments. Reward hacking research. Third-party evaluation framework with OpenMined for privacy-preserving safety evaluations.

Key People

Stella Biderman -- Executive Director and Head of Research. Mathematician, formerly also Lead Scientist at Booz Allen Hamilton (a major defense contractor). Declined a $100M VC deal to preserve nonprofit status. ICLR 2025 keynote speaker. Senate testimony on open-source AI. Author of "How Large Language Models Work" (Simon & Schuster). The decision-maker and public face of the org.

Nora Belrose -- Head of Interpretability. Self-described "AI optimist" (~1-2% p(doom)). Funded by Open Philanthropy ($2.6M over 4 years). No CS degree (BA in political science/linguistics). Self-taught ML researcher. Argues alignment is likely solvable with current techniques. Her research on ELK, LEACE, and concept erasure represents the technical core of EleutherAI's interpretability program.

Notable departures. All three co-founders left. Connor Leahy and Sid Black founded Conjecture (an alignment startup where Leahy estimates ~90% p(doom)). Leo Gao went to OpenAI's alignment team. Leahy attributed his departure to the difficulty of getting focused research done in a volunteer organization and remains on the board. Curtis Huebner (Head of Alignment, ~90% p(doom)) leads the MineTest project.

Team: ~24 full/part-time staff + ~12 regular volunteers. Staff regularly poached by OpenAI, Anthropic, Google at 3-4x salary.

Money and Incentives

Total budget: Under $3 million annually (as of Feb 2025). For context, this is less than half of what DeepSeek claimed to spend training a single chatbot.

Funding sources: Open Philanthropy ($2.6M, 4-year grant for Nora Belrose's interpretability work -- nearly equal to one year's entire budget), Omidyar Network, Hugging Face, Stability AI, Canva, Nat Friedman, Lambda Labs, Jack Dorsey's Start Small, Mozilla Foundation.

Compute: Entirely dependent on donated compute. Google TRC (TPUs for GPT-Neo, GPT-J), CoreWeave (GPUs for GPT-NeoX, Deep Ignorance), Stability AI (AWS cluster share), Prime Intellect, UK academic clusters. EleutherAI cannot do large-scale training without donated resources.

Business model: Grants and donations from a mix of philanthropic funders and commercial AI companies. No product revenue. No VC investment (explicitly declined).

The $100M VC test: An unnamed prominent VC offered to raise $100M+ if EleutherAI dropped its 501(c)(3) status. Biderman declined. This is the strongest evidence of genuine mission commitment.

Incentive concerns: (1) Several funders (Hugging Face, Stability AI, Canva) are commercial AI companies that benefit from open-source AI -- their interests align with EleutherAI's mission but could create pressure against safety positions that would restrict open-source releases. (2) The Open Philanthropy grant is earmarked for one researcher's program, creating funding concentration. (3) Stella Biderman's former dual role at Booz Allen Hamilton (a defense contractor) has received no public scrutiny. (4) Staff regularly poached at 3-4x salary, making EleutherAI a de facto training pipeline for frontier labs.

Legal structure: 501(c)(3) nonprofit (EleutherAI Institute), incorporated early 2023. EIN: 92-2215190. No 990 filings yet digitized, so independent verification of finances is not possible.

What Others Say

The case against open-source AI (IEEE Spectrum, Jan 2024): "Open-Source AI Is Uniquely Dangerous." Safety features can be stripped from open models (creating "uncensored" derivatives), models proliferate irreversibly, and developers lose all control post-release. Specific citation: WormGPT and FraudGPT, dark web cybercrime tools, were built on EleutherAI's GPT-J. The author recommends pausing releases of unsecured AI and establishing developer liability for foreseeable misuse.

GPT-J misuse: FraudGPT and WormGPT are "widely available on dark web markets. Both are based on the open-source large language model GPT-J developed by EleutherAI in 2021" (IBM). No public EleutherAI response to this specific misuse has been found.

Co-founder's self-critique (Connor Leahy): "I think there's absolutely a world in which EleutherAI is net negative. I do not want to deny that." He specifically identifies releasing The Pile as potentially a mistake, while defending model releases as always below the frontier. He left because getting focused alignment research done was too hard in a volunteer org.

The cultural bridge argument (Connor Leahy): "People in the ML community really respect EleutherAI... And then they also say, well, these guys take alignment really seriously. Well, maybe we should take it seriously too." Claims 4 of 5 core EleutherAI people went into alignment work.

The Pile copyright controversy: Contained Books3 (copyrighted books) and YouTube subtitles from 170,000+ videos. Wired/Proof News investigation (July 2024). EleutherAI responded with Common Pile (licensed/public domain only).

HN skepticism about nonprofit transition: Multiple commenters drew parallels to OpenAI's trajectory. Biderman acknowledges the question "How do we know you won't pull an OpenAI?" is one she's gotten used to answering.

What's Absent

No 990 financial data available for independent verification of claims about budget, compensation, or spending.
No public response from EleutherAI to the specific misuse of GPT-J for WormGPT/FraudGPT.
Current board composition unknown after Emad Mostaque's departure from Stability AI.
No formal conflict of interest disclosure policy documented despite board members' ties to Conjecture and Stability AI, and ED's former role at Booz Allen Hamilton.
No published annual report or financial transparency document.
No clear public articulation of where the current capability threshold for withholding models lies.
Post-Stability AI compute situation undocumented.

Stated Theory of Change

EleutherAI's theory of change has gone through two distinct phases:

Phase 1 (2020-2022): Open-sourcing below-frontier AI models enables safety research that would otherwise be impossible. The causal chain: (a) Large models exhibit emergent capabilities that small models don't, (b) Safety-relevant research requires hands-on access to model internals, not just API calls, (c) Only a few corporations can afford to train large models, (d) Therefore, releasing open-weight models below the frontier democratizes safety research without meaningfully advancing the capability frontier, because (e) Anyone with resources to be dangerous already has access to these capabilities.

Phase 2 (2023-present): Independent nonprofit research on interpretability, alignment, evaluation, and open-weight safety, free from commercial incentives. The causal chain: (a) As AI becomes more powerful, independent researchers who understand the technology but lack financial incentives tied to its commercial success are essential, (b) Open-source infrastructure (evaluation harnesses, research model suites, datasets) is a public good that enables the broader safety ecosystem, (c) Novel safety techniques for open-weight models (like pretraining data filtering) address the real risks of model proliferation without requiring centralized control.

The pivot was explicitly acknowledged: "there is substantially more interest in training and releasing LLMs than there once was," freeing EleutherAI to do the safety research they always intended.

Revealed Theory of Change

Looking at where EleutherAI actually spends its resources and what it produces, several patterns emerge:

Infrastructure over theory. The highest-impact outputs are infrastructure: lm-eval-harness (de facto standard for LLM evaluation), The Pile and Common Pile (widely-used training datasets), Pythia (model suite designed for scientific research). These have far more real-world adoption than any specific alignment research result. The revealed priority is "build the tools that make safety research possible for everyone" rather than "solve alignment ourselves."

Interpretability over alignment. The $2.6M OP grant is for interpretability (Nora Belrose's work). The alignment work (Curtis Huebner's MineTest project) appears smaller-scale. The blog posts and publications lean heavily toward interpretability (SAEs, MAD, ELK, LEACE, concept erasure) rather than alignment (corrigibility, reward hacking). This makes sense given Belrose's optimistic worldview -- if you think alignment is relatively easy, interpretability tools are more valuable than alignment theory.

Policy advocacy aligned with organizational survival. EleutherAI's policy positions (opposing SB 1047, advocating for open-source exemptions in EU AI Act) are genuinely held beliefs but also directly necessary for organizational survival. If open-source AI were banned or heavily regulated, EleutherAI's entire model collapses. This is not cynical -- the alignment between self-interest and stated beliefs is a strength, not a weakness. But it should be noted.

Open-weight safety as a unique niche. Deep Ignorance positions EleutherAI in a unique niche: the only serious research organization working on making open-weight models safer. Everyone else either (a) opposes open-weight models, or (b) releases them without novel safety techniques. This niche is both strategically smart and genuinely important.

Key Assumptions

1. Open-source AI is net positive for safety.

Evidence for: Enables independent safety research; lm-eval-harness as evaluation infrastructure; Pythia enabling training dynamics research; Deep Ignorance showing open-weight-specific safety techniques are possible.
Evidence against: GPT-J used for WormGPT/FraudGPT; "uncensored" derivatives of every open model; irreversible proliferation. The IEEE Spectrum argument that developers lose all control post-release.
Testable? Partially. The Deep Ignorance results are testable and have been tested. The broader "net positive" claim requires counterfactual reasoning that is inherently untestable.
If wrong: EleutherAI's entire Phase 1 was net negative, and their current advocacy for open-source exemptions in AI regulation is actively harmful. They would need to pivot to closed-source safety research or policy advocacy against open-source releases.

2. A $3M budget can produce meaningful safety research.

Evidence for: Deep Ignorance is a genuine contribution. 130+ publications. lm-eval-harness is industry-standard. The Pythia suite is widely used.
Evidence against: Frontier safety research increasingly requires frontier compute. Staff poached at 3-4x salary. Cannot compete with well-funded labs on compute-intensive interpretability work.
Testable? Yes -- track citations, adoption, and whether their techniques get used by frontier labs.
If wrong: EleutherAI becomes a talent pipeline for frontier labs rather than an independent research force.

3. Intellectual diversity (1% vs. 90% p(doom)) within leadership is a feature, not a bug.

Evidence for: Different worldviews lead to different research bets, increasing the portfolio's coverage. The org has produced both "AI optimist" work (ELK/LEACE assuming current techniques may suffice) and "AI pessimist" work (MAD for detecting deception, corrigibility research).
Evidence against: Without a shared threat model, prioritization becomes incoherent. If leadership can't agree on whether alignment is easy or nearly impossible, the research agenda is more "whatever each team leader wants" than strategically directed.
Testable? Difficult. Would require understanding internal resource allocation decisions.
If wrong: The org's research is scattered rather than focused, and would be more impactful with a shared worldview.

4. Nonprofit structure is sustainable for AI research.

Evidence for: The $100M VC rejection demonstrates commitment. 501(c)(3) status prevents equity distribution. Several AI nonprofits (AI2, MIRI, Redwood) have sustained for years.
Evidence against: OpenAI's trajectory. Staff poaching at 3-4x salary. Compute dependency on commercial donors. The "OpenAI question" from funders.
Testable? Track fundraising, staff retention, and whether the org maintains its nonprofit status over 5+ years.
If wrong: EleutherAI either collapses due to talent drain, or eventually converts to for-profit, repeating the OpenAI pattern.

Strengths

1. Unique credibility as a bridge. EleutherAI has genuine respect in both the ML community and the AI safety community. Connor Leahy: "EleutherAI is not really EA culture. It has a lot of DNA of EA and rationalists... but it's its own thing. It's like a mutant offspring between the ML world and the rationalism world." This bridging role is genuinely rare and valuable.

2. Infrastructure impact. lm-eval-harness, The Pile, Common Pile, and Pythia are foundational infrastructure used by the entire field. This creates leverage -- EleutherAI's evaluation framework shapes how everyone measures model quality, which is a form of governance.

3. Deep Ignorance is a real contribution. The first evidence that pretraining data filtering creates tamper-resistant safeguards in open-weight models. This directly addresses the strongest critique of open-source AI and creates a new research direction.

4. Demonstrated mission commitment. Declining $100M VC is the strongest possible signal. Employees accepting 3-4x pay cuts relative to industry. Nonprofit status as a structural commitment.

5. Lean operations with high output. 130+ publications and industry-standard tools on a sub-$3M budget is remarkably efficient. Per-dollar research output likely exceeds most AI safety organizations.

Weaknesses and Risks

1. GPT-J misuse is an unaddressed reputational liability. FraudGPT and WormGPT were built on GPT-J. EleutherAI has never publicly addressed this. For an org whose theory of change rests on "open-source is net positive for safety," the absence of any response to concrete misuse is a significant gap.

2. Financial precarity. Under $3M budget, dependent on donated compute, staff poached at 3-4x salary. The org is one major donor withdrawal or compute disruption away from crisis. The $2.6M OP grant to Nora Belrose is nearly equal to one year's entire budget -- if that ends, it's catastrophic.

3. Governance opacity. 3-person board with potential conflicts (one member founded a separate company, another ran a major donor). Current board composition unknown. No 990 data available. No annual report. No COI disclosure policy documented. For a nonprofit advocacy for transparency, this is ironic.

4. Talent pipeline risk. If EleutherAI's primary function becomes training researchers who then leave for frontier labs, it's a pipeline, not an independent research org. The 3-4x salary differential means every researcher is perpetually one offer away from departure.

5. Incoherent threat model. When the Head of Interpretability thinks p(doom) is 1-2% and the Head of Alignment thinks it's 90%, research prioritization becomes unclear. This isn't necessarily bad (portfolio diversification), but it means the org doesn't have a unified strategic direction.

Cross-References

Complements: MIRI (more theoretical alignment), Redwood Research (more empirical alignment), Anthropic (frontier safety research with commercial model). EleutherAI fills the niche of open-source safety infrastructure and open-weight-specific safety techniques that none of these others occupy.

Competes with: No direct competitor. The closest analogue is AI2 (nonprofit AI research), but AI2 has a much larger budget and focuses on different problems.

Enables: EleutherAI's infrastructure (lm-eval-harness, Pythia) is used by virtually every AI safety research group. Its models were used by Redwood Research, David Bau's lab, Databricks, and many others.

Pipeline to: Conjecture (Connor Leahy, Sid Black), OpenAI alignment team (Leo Gao), Anthropic, Google (via staff poaching). EleutherAI feeds talent into the frontier lab ecosystem, which is a mixed outcome.

What Would Change This Assessment

EleutherAI publicly addresses GPT-J misuse and articulates a clear policy on responsibility for downstream misuse of open models. Would significantly improve assessment of organizational maturity.
Deep Ignorance techniques are adopted by a frontier lab for open-weight model releases. Would validate the entire Phase 2 theory of change.
990 data becomes available showing financial details that either confirm or contradict the sub-$3M narrative.
Another major researcher departs for a frontier lab. Would confirm the talent pipeline concern.
Board expansion with genuinely independent directors. Would address the governance opacity.
A capability threshold is articulated -- under what conditions would EleutherAI stop releasing models or advocate against open-source releases?

Self-Critique

Weakest claim: My assessment that EleutherAI's internal worldview diversity is a "feature not a bug." It's plausible that the 88-percentage-point disagreement between Belrose and Huebner on p(doom) leads to genuinely incoherent research prioritization rather than healthy diversification. I lack visibility into how internal resource allocation decisions are actually made.

What I might be wrong about: I may be overweighting the GPT-J misuse concern. WormGPT and FraudGPT existed briefly and were relatively unsophisticated -- the argument that these tools would have been built with or without GPT-J has some merit. The actual marginal harm from EleutherAI's specific model releases may be small.

Sources I should have checked: Stella Biderman's full Senate testimony (PDF was not extractable). ICLR 2025 keynote content. Internal Slack/Discord discussions about research prioritization (not accessible). 990 filings (not yet available).

Potential bias: I may be somewhat favorable toward EleutherAI because the "scrappy nonprofit doing good work on a shoestring" narrative is appealing. The org's transparency about limitations (Deep Ignorance honestly reporting RAG bypass, MAD honestly reporting generalization failures) creates trust that might cause me to give them more credit than warranted on unverifiable claims (like the sub-$3M budget).

What a thoughtful critic would say: "EleutherAI is a well-intentioned organization that has done real damage by normalizing open-source release of capable AI models, creating infrastructure (The Pile) that powered models they couldn't control, and now advocates for policies that serve their organizational survival while packaging them as safety positions. Deep Ignorance is promising but addresses only one attack vector, and the org has never reckoned with the actual harm caused by WormGPT/FraudGPT."

Connected to (12)

UK AI Safety Institutecollaborator OpenMinedcollaborator Fund for Alignment Researchstaff from · Nora Belrose Conjecturespun off from · Connor Leahy, Sid Black OpenAIstaff to · Leo Gao

University of Oxfordcollaborator

Hugging Facecollaborator

Mozillacollaborator

Booz Allen Hamiltonstaff from · Stella Biderman

Databrickscollaborator

CarperAIspun off from · Louis Castricato

Stability AIcollaborator

Sources (65)

Every URL that was read during research.

1.EleutherAI - Wikipediaen.wikipedia.org
2.About — EleutherAIeleuther.ai
3.EleutherAI: When OpenAI Isn’t Open Enoughspectrum.ieee.org
4.Stability AI, Hugging Face and Canva back new AI research nonprofit | TechCrunchtechcrunch.com
5.Why Release a Large Language Model?blog.eleuther.ai
6.What A Long, Strange Trip It's Been: EleutherAI One Year Retrospectiveblog.eleuther.ai
7.The View from 30,000 Feet: Preface to the Second EleutherAI Retrospectiveblog.eleuther.ai
8.Connor Leahy - Wikipediaen.wikipedia.org
9.CoreWeave, EleutherAI & NovelAI Launch GPT-NeoX-20B and GooseAIcoreweave.com
10.Stella Bidermanstellabiderman.ai
11.Mozilla, EleutherAI, and Hugging Face Provide Comments on California’s SB 1047 – Open Policy & Advocacyblog.mozilla.org
12.EleutherAI announces it has become a non-profitnews.ycombinator.com
13.EleutherAI went from Discord coders to a truly open AI research organizationthe-decoder.com
14.Stella Biderman: How EleutherAI Trains and Releases LLMswandb.ai
15.Press Releases | Booz Allen Hamilton Inc.boozallen.com
16.Alignment Research @ EleutherAIblog.eleuther.ai
17.Interpretability — EleutherAIeleuther.ai
18.Papers — EleutherAIeleuther.ai
19.Open-Source AI Is Uniquely Dangerousspectrum.ieee.org
20.Evaluation Harness Is Setting the Benchmark for Auditing Large Language Modelsmozillafoundation.org
21.Evaluating LLMs — EleutherAIeleuther.ai
22.The Pile (dataset) - Wikipediaen.wikipedia.org
23.What I learned from the creators of The Pile, one of the world's largest AI training datasetssharongoldman.substack.com
24.Staff — EleutherAIeleuther.ai
25.EleutherAIeleuther.ai
26.Pretraining Data Filtering for Open-Weight AI Safetyblog.eleuther.ai
27.Blogblog.eleuther.ai
28.EleutherAI Second Retrospective: The long versionblog.eleuther.ai
29.EleutherAI's Thoughts on the EU AI Actblog.eleuther.ai
30.How the Foundation Model Transparency Index Distorts Transparencyblog.eleuther.ai
31.The Common Pile v0.1blog.eleuther.ai
32.Open Source Automated Interpretability for Sparse Autoencoder Featuresblog.eleuther.ai
33.Yi-34B, Llama 2, and common practices in LLM training: a fact check of the New York Timesblog.eleuther.ai
34.Experiments in Weak-to-Strong Generalizationblog.eleuther.ai
35.Reward Hacking Resarch Updateblog.eleuther.ai
36.Connor Leahy on Dignity and Conjecturetheinsideview.ai
37.Curtis Huebner on H100 orders and AI timelinestheinsideview.ai
38.Minetester: A fully open RL environment built on Minetestblog.eleuther.ai
39.#7: Nora Belrosetheojaffee.com
40.Inside the Mind of AI (Robert Wright & Nora Belrose)nonzero.org
41.Third-party evaluation to identify risks in LLMs’ training datablog.eleuther.ai
42.Hacker Newsnews.ycombinator.com
43.Unknownarxiv.org
44.RLHF and RLAIF in GPT-NeoXblog.eleuther.ai
45.'We are super, super fucked': Meet the man trying to stop an AI apocalypsesifted.eu
46.Connor Leahy on AI Safety and Why the World is Fragile - Future of Life Institutefutureoflife.org
47.Alignment — EleutherAIeleuther.ai
48.Alignment MineTest — EleutherAIeleuther.ai
49.What is EleutherAI?androidpolice.com
50.EleutherAI releases massive AI training dataset of licensed and open domain text | TechCrunchtechcrunch.com
51.EleutherAI: Going Beyond "Open Science" to "Science in the Open"arxiv.org
52.ANTHOLOGY — Open source AI with Beyang Liu, Denny Lee & Stella Biderman at OSS NA 2023 (Changelog Interviews #541)changelog.com
53.Currents 087: Shivanshu Purohit on Open-Source Generative AI - The Jim Rutt Showjimruttshow.com
54.Mechanistic Anomaly Detection Research Updateblog.eleuther.ai
55.Mechanistic Anomaly Detection Research Update 2blog.eleuther.ai
56.Eliciting Latent Knowledge — EleutherAIeleuther.ai
57.Community — EleutherAIeleuther.ai
58.Attention Probesblog.eleuther.ai
59.The Foundation Model Development Cheatsheetblog.eleuther.ai
60.SAEs trained on the same data don’t learn the same featuresblog.eleuther.ai
61.VINC-S: Closed-form Optionally-supervised Knowledge Elicitation with Paraphrase Invarianceblog.eleuther.ai
62.Free Form Least-Squares Concept Erasure Without Oracle Concept Labelsblog.eleuther.ai
63.Partially rewriting an LLM in natural languageblog.eleuther.ai
64.How Philanthropy Built, Lost, and Could Reclaim the A.I. Racephilanthropy.com
65.Open source, open risks: The growing dangers of unregulated generative AI | IBMibm.com