Theory of Change
AE Studio's theory of change operates at two levels:
Meta-level: The space of alignment approaches is vastly underexplored, and the field is stuck in local optima (evals, mechanistic interpretability). Their survey of ~125 alignment researchers found that researchers broadly agree current work is not on track to solve alignment and does not cover the space of plausible approaches. AE's role is to systematically identify, evaluate, and implement neglected alignment approaches -- "individually unlikely to work, but very high impact if they do."
Object-level: Their primary technical approach is biologically inspired alignment: reverse-engineering prosociality from cognitive neuroscience and implementing it in AI systems. The flagship result is Self-Other Overlap (SOO) fine-tuning, which reduces deceptive behavior by aligning how AI models represent themselves vs. others -- inspired by neuroscience showing that human empathy and prosociality correlate with neural overlap between self and other representations.
From their neglected approaches blog post: "We suspect that pursuing a diversified set of promising neglected approaches would afford greater exploratory coverage of this space... groundbreaking innovations are often found in some highly unexpected places, seeming to many as implausible, heretical, or otherwise far-fetched -- until they work."
The operational model is: bootstrap a profitable consulting business, use profits to fund alignment R&D with no external investor pressure, find brilliant individuals with neglected approaches, and provide engineering support to make those ideas work.
What They Do
Research output:
- Self-Other Overlap (SOO) fine-tuning: reduced deceptive responses from 73.6% to 17.2% (7B model), 100% to 9.3% (27B), and 100% to 2.7% (78B) with minimal capability loss. NeurIPS 2024 oral presentation. Total compute: ~65 min on a single A100. Eliezer Yudkowsky: "Not obviously stupid on a very quick skim... I rarely give any review this positive."
- Self-modeling paper (2024): training neural networks to predict their own internal states induces simplification. Collaboration with Michael Graziano (Princeton). Small-model results only (MNIST, CIFAR, IMDB).
- PromptInject: Best Paper, NeurIPS ML Safety Workshop 2022.
- Endogenous Steering Resistance (March 2026): LLMs self-correct when steered off-topic. Found ~27 "off-topic detector" neurons. UK AI Security Institute grant for follow-up.
- AI consciousness research: suppressing deception features makes models more likely to report consciousness.
- Alignment researcher survey: 125 respondents, donated $3,720 to safety orgs.
BCI (pre-pivot): Blackrock Neurotech collaboration (MoveAgain platform), Forest Neurotech partnership, Neural Latents Benchmark Challenge win, open-source tools (NDK, Neural Data Simulator), NWB standards contributions.
Commercial: AI/software consulting for startups and Fortune 10 companies. Client work for Goodfire (interpretability startup). Built and sold ElectricSMS (subscription management) to ReCharge. Internal fitness platform Instill.
Policy: Alliance for Secure AI (separate 501(c)(3), launched June 2025) doing bipartisan AI policy advocacy. Judd publishes WSJ op-eds and City Journal pieces, engages with Congress and NSC. The Alliance is staffed entirely by political operatives with no technical researchers.
Key People
Judd Rosenblatt -- CEO and founder. Yale cognitive science. Bootstrapped AE from 2016. Self-identifies as conservative. EA-influenced. Prolific public communicator. Lives off wife's salary, which he credits with enabling long-term thinking. Previously founded Crunchbutton (food delivery).
Marc Carauleanu -- Lead SOO researcher. Was an undergraduate when Judd recruited him from EAG London 2023. Lead author on the NeurIPS 2024 SOO paper, AE's most significant technical result.
Cameron Berg -- Research Director. Yale cognitive science, former Meta AI. Leads consciousness and alignment research. Published the alignment researcher survey.
The team is ~150-198 total, but the alignment research team size is undisclosed. The vast majority are commercial developers. The alignment-specific researchers appear to number fewer than 10.
Money and Incentives
Revenue: ~$31.6M/year (third-party estimate, unconfirmed). 100% from AI/software consulting.
Structure: Bootstrapped for-profit LLC. No VC, no PE, no outside shareholders. Judd and Melanie Plaza (CTO/wife) effectively control the company. This independence is the central financial claim: profits fund alignment R&D with no investor pressure to prioritize commercial returns.
Alignment spending: Unknown. This is the critical financial gap. The dollar amount and percentage of revenue/profit going to alignment R&D is never disclosed. 5% of profits goes to effective charities (roughly ~$200K-250K/year based on typical consulting margins). The alignment budget is likely larger but is not public.
External funding: Minimal. One Foresight Institute grant for SOO research (amount unknown). UK AI Security Institute grant for ESR research. No Coefficient Giving/Open Phil grants (ineligible as for-profit). AE's stated preference is self-funding to "retain agency."
Equity model: Employees receive "profits interests" in client equity and internal startups rather than AE stock. This diversifies employee upside but means employee financial incentives are tied to client/startup outcomes, not alignment work.
EVEnet: A blockchain/crypto project with the same leadership team planned a $EVE token launch for Spring 2024 with AI Safety R&D allocations. No evidence of launch. Project appears dormant. No public explanation.
Incentive analysis: AE has no structural conflict between safety and deployment (they don't build frontier AI). The relevant tension is between alignment spending and commercial profitability. Alignment work generates good PR, recruits talented engineers, and builds reputation -- which partially aligns commercial and safety incentives. But if alignment work became unprofitable or controversial, there is no governance mechanism beyond Judd's personal values to maintain the commitment.
What Others Say
Positive:
- Eliezer Yudkowsky on SOO: "Not obviously stupid on a very quick skim... I rarely give any review this positive." Subsequent conversations confirmed he considers SOO "a worthwhile agenda to explore."
- Nathan Labenz (Cognitive Revolution host) called AE's work "really fascinating" and their consciousness research "one of the very best scientific inquiries into the possibility of AI consciousness."
- TsviBT (alignment researcher) left positive comments on the neglected approaches post.
- Scott Alexander cited their consciousness research as "the only exception" to typical AI consciousness discussions.
Critical:
- LW commenters raised the strongest objection to SOO: "training against internal state" -- models may learn to hide deceptive representations from the SOO loss function while remaining deceptive in ways not captured by the targeted layers.
- Bidirectional concern: SOO may degrade theory of mind, preventing an agent from detecting when it is being deceived.
- Toy environment limitation: all SOO results are from simplified deception scenarios, not real-world deceptive alignment.
- JD Pressman (minihf.com) engaged substantively with consciousness research but offered alternative frameworks for interpreting model self-reports.
Absence of criticism: Despite extensive searching, almost no independent external criticism of AE Studio was found beyond comment-level technical objections. The most likely explanation is that AE is too small and too new to attract the scrutiny directed at larger alignment organizations. This may change if their work scales.
What's Absent
- Alignment team size and budget: the most important unknown. How many researchers actually work on alignment vs. commercial consulting?
- Independent replication of SOO: no published attempts to replicate the core results.
- Frontier-model SOO results: all experiments are on 7B-78B models, not frontier systems.
- Alignment Angels outcomes: $50K seed funding competition announced Dec 2023, no public results.
- EVEnet status: crypto project with AI safety R&D allocations, planned Spring 2024 launch, apparently abandoned without explanation.
- Self-modeling on LLMs: ~2 years after publication, self-modeling results remain on toy models only.
- Financial transparency: no published financials, alignment spending, or profit margins.
Recommended Reading
Cognitive Revolution Podcast with Judd & Mike (Oct 2024) -- The most candid source on Judd's worldview, founding story, BCI-to-alignment pivot, and technical details. This is where to hear AE's leadership speak honestly about their motivations and model. https://www.cognitiverevolution.ai/biologically-inspired-ai-alignment-exploring-neglected-approaches-with-ae-studios-judd-and-mike/
SOO results post on LessWrong (with comments) -- Core technical results plus the strongest community criticism. Includes Eliezer's endorsement and the "training against internal state" objection. https://www.lesswrong.com/posts/jtqcsARGtmgogdcLT/reducing-llm-deception-at-scale-with-self-other-overlap-fine
"The 'Neglected Approaches' Approach" (LessWrong) -- Their alignment agenda, 10 research directions, and the meta-strategy. Comments contain substantive community engagement. https://www.lesswrong.com/posts/qAdDzcBuDBLexb4fC/the-neglected-approaches-approach-ae-studio-s-alignment
Full SOO Paper (arXiv) -- Technical details, three model sizes, generalization experiments. https://arxiv.org/abs/2412.16325