Theory of Change
EleutherAI's theory of change has evolved significantly since its founding.
Original (2020-2022): Open-sourcing large language models enables safety research that would otherwise be locked behind corporate APIs. Co-founder Connor Leahy articulated three arguments: (1) important safety research requires access to model internals, not just API access; (2) withholding models from bad actors is futile because "the only secret was that it was possible"; (3) security through obscurity is not real security. The org explicitly stated it would never release a model beyond the frontier.
Current (2023-present): Independent nonprofit research on interpretability, alignment, and AI governance, independent of commercial incentives. Stella Biderman (Feb 2025): "We need there to be people who are experts in the development of this technology, who do research on it and promote access to it but who don't have a vested financial interest in its commercial success." The pivot was driven by other organizations (Meta, Mistral) taking over large-scale open model releases, freeing EleutherAI to pursue "the research we wanted to use these models for to begin with."
What They Do
Safety research. The most significant recent output is Deep Ignorance (Aug 2025, with UK AISI and Oxford): demonstrated that filtering pretraining data to remove dangerous knowledge creates tamper-resistant safeguards in open-weight models. Filtered models cannot regenerate dangerous knowledge even when fine-tuned on unsafe data -- the first credible defense against the "just fine-tune away the safety training" attack on open models. However, the paper honestly notes this does not prevent in-context retrieval via RAG. Best Paper Runner-up at NeurIPS 2025 workshop.
Interpretability research. Mechanistic Anomaly Detection (0.95 AUROC on toy deception-detection tasks, with honest reporting that it doesn't generalize to all models/tasks). ELK research (VINC-S, eliciting latent knowledge from model activations). LEACE (provable concept erasure). SAE feature analysis (including an important negative result showing SAEs don't learn consistent features across seeds). Automated interpretability for SAE features. 130+ publications in top ML venues (NeurIPS, ICML, ICLR, EMNLP, Nature, ACL).
Infrastructure. lm-evaluation-harness (11,900+ GitHub stars) is the backend for Hugging Face's Open LLM Leaderboard and used by NVIDIA, Cohere, Google, Microsoft. Before this, every lab ran evaluations differently, making meaningful comparison impossible. The Pythia model suite (28 models, identical data/order, 154 checkpoints) was specifically designed to enable scientific research on training dynamics. Common Pile v0.1 (8TB, licensed/public domain text) addresses copyright concerns from the original Pile. Models downloaded over 70 million times.
Policy. Joint letter with Mozilla and Hugging Face opposing SB 1047 (arguing it would harm open-source AI). EU AI Act advocacy for open-source exemptions. Stella Biderman provided Senate testimony on open-source AI (Sept 2023). EleutherAI positions itself as the policy voice for the open-source AI safety community.
Alignment. Alignment MineTest project (Curtis Huebner): studying corrigibility and embedded agency in RL environments. Weak-to-strong generalization experiments. Reward hacking research. Third-party evaluation framework with OpenMined for privacy-preserving safety evaluations.
Key People
Stella Biderman -- Executive Director and Head of Research. Mathematician, formerly also Lead Scientist at Booz Allen Hamilton (a major defense contractor). Declined a $100M VC deal to preserve nonprofit status. ICLR 2025 keynote speaker. Senate testimony on open-source AI. Author of "How Large Language Models Work" (Simon & Schuster). The decision-maker and public face of the org.
Nora Belrose -- Head of Interpretability. Self-described "AI optimist" (~1-2% p(doom)). Funded by Open Philanthropy ($2.6M over 4 years). No CS degree (BA in political science/linguistics). Self-taught ML researcher. Argues alignment is likely solvable with current techniques. Her research on ELK, LEACE, and concept erasure represents the technical core of EleutherAI's interpretability program.
Notable departures. All three co-founders left. Connor Leahy and Sid Black founded Conjecture (an alignment startup where Leahy estimates ~90% p(doom)). Leo Gao went to OpenAI's alignment team. Leahy attributed his departure to the difficulty of getting focused research done in a volunteer organization and remains on the board. Curtis Huebner (Head of Alignment, ~90% p(doom)) leads the MineTest project.
Team: ~24 full/part-time staff + ~12 regular volunteers. Staff regularly poached by OpenAI, Anthropic, Google at 3-4x salary.
Money and Incentives
Total budget: Under $3 million annually (as of Feb 2025). For context, this is less than half of what DeepSeek claimed to spend training a single chatbot.
Funding sources: Open Philanthropy ($2.6M, 4-year grant for Nora Belrose's interpretability work -- nearly equal to one year's entire budget), Omidyar Network, Hugging Face, Stability AI, Canva, Nat Friedman, Lambda Labs, Jack Dorsey's Start Small, Mozilla Foundation.
Compute: Entirely dependent on donated compute. Google TRC (TPUs for GPT-Neo, GPT-J), CoreWeave (GPUs for GPT-NeoX, Deep Ignorance), Stability AI (AWS cluster share), Prime Intellect, UK academic clusters. EleutherAI cannot do large-scale training without donated resources.
Business model: Grants and donations from a mix of philanthropic funders and commercial AI companies. No product revenue. No VC investment (explicitly declined).
The $100M VC test: An unnamed prominent VC offered to raise $100M+ if EleutherAI dropped its 501(c)(3) status. Biderman declined. This is the strongest evidence of genuine mission commitment.
Incentive concerns: (1) Several funders (Hugging Face, Stability AI, Canva) are commercial AI companies that benefit from open-source AI -- their interests align with EleutherAI's mission but could create pressure against safety positions that would restrict open-source releases. (2) The Open Philanthropy grant is earmarked for one researcher's program, creating funding concentration. (3) Stella Biderman's former dual role at Booz Allen Hamilton (a defense contractor) has received no public scrutiny. (4) Staff regularly poached at 3-4x salary, making EleutherAI a de facto training pipeline for frontier labs.
Legal structure: 501(c)(3) nonprofit (EleutherAI Institute), incorporated early 2023. EIN: 92-2215190. No 990 filings yet digitized, so independent verification of finances is not possible.
What Others Say
The case against open-source AI (IEEE Spectrum, Jan 2024): "Open-Source AI Is Uniquely Dangerous." Safety features can be stripped from open models (creating "uncensored" derivatives), models proliferate irreversibly, and developers lose all control post-release. Specific citation: WormGPT and FraudGPT, dark web cybercrime tools, were built on EleutherAI's GPT-J. The author recommends pausing releases of unsecured AI and establishing developer liability for foreseeable misuse.
GPT-J misuse: FraudGPT and WormGPT are "widely available on dark web markets. Both are based on the open-source large language model GPT-J developed by EleutherAI in 2021" (IBM). No public EleutherAI response to this specific misuse has been found.
Co-founder's self-critique (Connor Leahy): "I think there's absolutely a world in which EleutherAI is net negative. I do not want to deny that." He specifically identifies releasing The Pile as potentially a mistake, while defending model releases as always below the frontier. He left because getting focused alignment research done was too hard in a volunteer org.
The cultural bridge argument (Connor Leahy): "People in the ML community really respect EleutherAI... And then they also say, well, these guys take alignment really seriously. Well, maybe we should take it seriously too." Claims 4 of 5 core EleutherAI people went into alignment work.
The Pile copyright controversy: Contained Books3 (copyrighted books) and YouTube subtitles from 170,000+ videos. Wired/Proof News investigation (July 2024). EleutherAI responded with Common Pile (licensed/public domain only).
HN skepticism about nonprofit transition: Multiple commenters drew parallels to OpenAI's trajectory. Biderman acknowledges the question "How do we know you won't pull an OpenAI?" is one she's gotten used to answering.
What's Absent
- No 990 financial data available for independent verification of claims about budget, compensation, or spending.
- No public response from EleutherAI to the specific misuse of GPT-J for WormGPT/FraudGPT.
- Current board composition unknown after Emad Mostaque's departure from Stability AI.
- No formal conflict of interest disclosure policy documented despite board members' ties to Conjecture and Stability AI, and ED's former role at Booz Allen Hamilton.
- No published annual report or financial transparency document.
- No clear public articulation of where the current capability threshold for withholding models lies.
- Post-Stability AI compute situation undocumented.
Recommended Reading
Stella Biderman in Chronicle of Philanthropy (Feb 2025) -- The most candid source. Discusses $3M budget, declining $100M VC deal, employee poaching, mission tension between nonprofit and for-profit AI worlds. https://www.philanthropy.com/news/how-philanthropy-built-lost-and-could-reclaim-the-a-i-race/
IEEE Spectrum: "Open-Source AI Is Uniquely Dangerous" (Jan 2024) -- The strongest counterargument to EleutherAI's theory of change, with specific reference to misuse of GPT-J. https://spectrum.ieee.org/open-source-ai-2666932122
Connor Leahy on Inside View (July 2022) -- 3-hour interview with co-founder who left. Founding story, misconceptions, why the volunteer model failed, defense and critique of open-source policy. https://theinsideview.ai/connor2
Deep Ignorance blog post (Aug 2025) -- EleutherAI's most important safety output. First evidence that pretraining data filtering creates tamper-resistant safeguards in open-weight models. https://blog.eleuther.ai/deep-ignorance/
Nora Belrose interview (Nov 2023) -- Head of Interpretability's "AI optimist" worldview, ~1-2% p(doom), arguments for alignment by default. Essential for understanding current EleutherAI's intellectual culture. https://www.theojaffee.com/p/7-nora-belrose