Theory of Change
MAI's theory of change begins with a philosophical claim: current AI alignment approaches fail because they confuse values with preferences, slogans, and rules. They define values as "constitutive attentional policies" -- what people pay attention to when making choices that are part of living well, not merely instrumental. Their causal chain:
- Develop rigorous definitions of what human values actually are (drawing on Charles Taylor, Ruth Chang, Amartya Sen)
- Build democratic processes that elicit these values from diverse populations and aggregate them into a "moral graph" -- a directed graph where edges represent broad agreement that one value is wiser than another in context
- Use this moral graph to align AI systems (replacing RLHF/Constitutional AI), economic mechanisms, and democratic institutions
In their own words: "We're a research organization. We're not just working on AI alignment, we're working on re-envisioning the whole stack around meaning, including AI but also including institutions like democratic institutions and markets eventually." (Oliver Klingefjord, 2025)
The Full-Stack Alignment paper (Dec 2025) frames this as a third paradigm -- "thick models of value" (TMV) -- beyond preferentist models of value (PMV, i.e. utility functions) and values-as-text (VAT, i.e. constitutional AI principles).
What They Do
Core experiment: Democratic Fine-Tuning with 500 Americans (2023), funded by OpenAI's $100K democratic inputs grant. Participants used a chatbot to articulate values for contentious scenarios (abortion, parenting, weapons). Key results: 97% could articulate a value in the required format, 89% endorsed the moral graph as fair even when their value wasn't the winner, Republicans and Democrats converged on shared "wiser" values.
Model prototype: WiseLlama-8B, an open-source LLaMA model fine-tuned to display "model integrity" -- acting from coherent, inspectable values rather than rules or vague traits. Small user study (n=78) showed participants found it less "preachy" than default models and could predict its behavior from its values.
Papers: "What are human values, and how do we align AI to them?" (arXiv 2024) and "Full-Stack Alignment" (arXiv 2025, 30+ co-authors from MIT, Oxford, CMU, Stanford, Harvard). Neither peer-reviewed.
Expanding scope: Market intermediaries (2025-2026) -- AI-enabled economic mechanisms that pay suppliers based on measured human flourishing rather than engagement metrics. Planning a 200-person pilot.
Zero adoption: No AI lab, platform, or institution has deployed any MAI technology. Klingefjord confirmed in 2025: "No, I don't think there is any AI model or service that people know is trained using this approach." The OpenAI grant work was never integrated into any product due to OpenAI's internal turmoil in late 2023.
Key People
Joe Edelman -- Founder & Director. Philosopher-technologist who co-founded Center for Humane Technology with Tristan Harris and coined "Time Well Spent." Background in HCI and programming language design. Has been working on "what to optimize instead of engagement" since 2013. Not an ML researcher -- the intellectual framework is philosophical.
Ryan Lowe -- Research Ecosystem Lead. Co-creator of InstructGPT/RLHF at OpenAI, co-lead for GPT-4 alignment. Left OpenAI ~March 2024 to join MAI. 49K+ Google Scholar citations. His departure from OpenAI to join a 3-person nonprofit is arguably the strongest signal of belief in MAI's approach.
Oliver Klingefjord -- Co-Founder, Technical Lead. Swedish software entrepreneur (founded Potential.app, acquired 2022). Mathematics background. First author on the human values paper.
Team is 3-3.5 FTE, supplemented by 13-member research network of academics from major institutions and a 6-member advisory board including Aviv Ovadya (AI & Democracy Foundation), Brian Christian (author of "The Alignment Problem"), and Liv Boeree.
Money and Incentives
Total confirmed funding: ~$265K (OpenAI $100K, SFF $165K) Additional unconfirmed: ARIA (UK government, amount unknown), Manifund projects (amounts unclear) Total budget: Unknown -- fiscal sponsorship via Hack Foundation means no public 990 filings Business model: Pure grants and donations, no product revenue No Open Philanthropy / Coefficient Giving funding -- notable absence for an AI safety org of this age No compute dependencies or lab funding ties -- MAI is financially independent of frontier AI labs No structural conflicts of interest identified -- the OpenAI grant was a one-time competitive award, not an ongoing relationship
The financial picture is precarious. $265K confirmed over 2.5+ years of operation is very modest. The lean 3-person structure keeps burn rate low but limits execution capacity. The research network model (academics contributing from other institutions) extends reach without direct costs.
What Others Say
Peer-reviewed review of all 10 OpenAI Democratic Inputs teams (Moats & Ganguly, PMC 2025): Notes that as MAI's values become more universal, "they might also become less useful for action." Questions whether the entire Democratic Inputs program was taken seriously by OpenAI: "many of the teams also had the impression that OpenAI was not all that interested in the ultimate results."
Substack commenter (strongest direct critique found): "The appropriation of wisdom as something that trickles up from democracy and massive public processes is an interesting, but potentially dangerous claim... Have massive public processes historically proven to result in wisdom? Our collective is also an adolescent culture that is not trustworthy." Advocates for "wise elders" rather than mass participation.
LessWrong commenter (MiguelDev): "This approach demands a well-informed population capable of setting aside their biases."
Academic critique (Moral Disagreement paper, PMC 2025): Argues crowdsourcing, RLHF, and Constitutional AI all fail to accommodate reasonable moral disagreement. MAI's DFT is a more sophisticated variant but faces the same fundamental challenge.
LW community engagement: Minimal. The technical alignment community appears to have largely not engaged with MAI's work. This silence is itself a finding -- the approach sits at the intersection of philosophy, social choice theory, and alignment, a niche few technical researchers inhabit.
What's Absent
- No formal benchmarks comparing DFT-aligned models against RLHF or CAI baselines. For an org proposing to replace these methods, this is the critical missing evidence.
- No replication of the 500-person experiment. Single pilot, Americans only, three scenarios.
- No adoption by any external entity after 2.5+ years.
- No engagement with catastrophic risk -- MAI works on "what to align to," not "how to prevent AI from pursuing misaligned objectives." No treatment of deceptive alignment, rapid capability gain, or loss-of-control scenarios.
- No peer-reviewed publications. Both papers are arXiv preprints.
- No response from Anthropic to MAI's direct critique of Claude's constitution.
- No substantive external critique specifically targeting MAI's approach. The technical alignment community has neither endorsed nor challenged the moral graph concept.
Recommended Reading
Oliver Klingefjord interview -- Democracy Innovators (2025) -- the most candid explanation of what MAI does, how the moral graph works, and the founding story. Start here for unfiltered understanding. https://democracyinnovators.com/oliver-klingefjord-about-the-meaning-alignment-institute-and-how-to-bring-up-wisdom-in-a-collective/
"Bringing AI participation down to scale" -- Moats & Ganguly (PMC 2025) -- the strongest external assessment, reviewing all 10 OpenAI Democratic Inputs teams. Questions fundamental assumptions. https://pmc.ncbi.nlm.nih.gov/articles/PMC12142630/
"What are human values?" (arXiv 2024) -- the foundational technical paper. https://arxiv.org/abs/2404.10636
"Model Integrity" (Substack, Dec 2024) -- the most concrete product vision. https://meaningalignment.substack.com/p/model-integrity