Aligned AI

Research

Stuart Armstrong. FHI spinoff.

Founded: 2021
HQ: Thame, Oxfordshire, England
Team: 5
Structure: limited company (UK)
Model: Vc Investment

Theory of Change

Aligned AI was founded on the theory that concept extrapolation -- teaching AIs to generalize human concepts correctly across changing environments -- is "the very center" of AI alignment. Co-founder Stuart Armstrong:

"At some point, it'll become much easier to generalize capabilities than to generalize values, goals, those kind of objects. If the goals change as the world model changes, then you are naturally -- well not naturally, artificially -- solving part of the alignment problem." (AXRP #18, Sep 2022)

The mechanism: develop algorithms (ACE) that notice when an AI's world model changes, generate multiple candidate extrapolations of its goals, and select safe ones. Armstrong gave his agenda a 10% chance of directly achieving alignment and a 95% chance that "some of these ideas will be very useful for other methods of alignment."

By 2025-2026, the theory of change has shifted fundamentally. CEO Rebecca Gorman now argues that LLMs are "glorified autocomplete," scaling laws are dead, and AGI via chatbots is impossible. The company has pivoted to Canvas, an offline family-safe hardware+software ecosystem for children, with a Wikimedia Enterprise partnership. The new theory of change addresses near-term AI harms (parasocial relationships, cognitive deskilling, content safety) rather than existential risk from misaligned superintelligence.

The company's mission-lock clause is broad enough to encompass both: "To create technology (particularly AI and machine learning) that increases human safety, agency, ability, well-being, self-actualisation, and understanding."

What They Do

Technical alignment research (2022-2023):

ACE (Algorithm for Concept Extrapolation) beat the CoinRun misgeneralization benchmark: 72% vs 59% prior best (55% random baseline). Patent pending.
Content moderation tool scored 97% vs OpenAI's 32% on Google Jigsaw's benchmark.
faAIr: gender bias measurement tool tested across 14 LLMs. Won CogX 2023 "Best Innovation: Algorithmic Bias Mitigation."
GPT-Eliezer/DATDP: jailbreak defense method blocking 99.5-100% of augmented jailbreak attempts. Open-sourced.

Near-term safety and consumer product (2025-2026):

AI Chaperones: framework for preventing parasocial relationships with chatbots (Aug 2025 paper).
Emergent misalignment analysis: proposed "broken superego" hypothesis for finetuned models (Mar 2025).
Canvas: offline digital ecosystem with e-reader, "Privacy Pod" (family cloud), offline Wikipedia, non-chatbot AI, and coding tools for children. Announced Feb 2026, waitlist only.
Wikimedia Enterprise partnership for offline Wikipedia integration (Mar 2026).

Policy engagement: Gorman has advised EU, UN, OECD, and UK Parliament. Papers on EU AI Act manipulation mechanisms (27 citations).

Publication record: 7 formal papers (arXiv), ~15 blog posts on technical topics. No publications in top ML venues (NeurIPS, ICML, ICLR). Most-cited paper: "Recognising the importance of preference change" (36 citations, co-authored with Franklin and Ashton).

Key People

Rebecca Gorman -- Co-Founder & CEO. American, grew up in Silicon Valley, programmed since age 8, Oxford-educated. 20+ years AI experience. Strongly contrarian worldview: dismisses LLMs as "glorified autocomplete" and predicts an AI winter. REWork Top 100 Women Advancing AI, Fortune 50 AI Innovators (both 2023). SERI MATS mentor.

Stuart Armstrong -- Co-Founder & Chief Mathematician. British/Canadian. Decade at FHI Oxford. Pioneer of safely interruptible agents (with DeepMind's Orseau), corrigibility, low-impact agents. Author of "Smarter Than Us" (MIRI, 2014). Co-authors include Bostrom, Orseau, Leike, Soares, Sandberg. DPhil in Cartan geometry.

Team: 2-10 employees per LinkedIn. Social anthropologist Jocelyn Kelley joined for Canvas market research. Advisors include Dylan Hadfield-Menell (MIT), Adam Gleave (FAR.AI), Anders Sandberg.

Notable departure: Dr. Adam Bell served as director for only 2 months (May-Jul 2022). Described as "IP Advisor" -- likely a limited-scope engagement.

Money and Incentives

Total funding: ~$730K angel round (Sep 2023) at $24M post-money valuation. Investors anonymous. Additional small raises suggested by share allotments in Jun 2022, Mar 2024, and Jan 2025. Funded by "bootstrapping and angel investments."

Revenue: No evidence of product revenue from any source. EquitAI, faAIr, and ClassifAI have no visible customers. Canvas is pre-launch (waitlist). Gorman claimed "more inbound sales inquiries than we can service" in Aug 2023, but no visible adoption followed.

No Coefficient Giving/Open Philanthropy funding. This is notable: CG/Open Phil is the dominant funder of alignment research organizations. The absence suggests either non-application, rejection, or deliberate avoidance.

Financial health is opaque. UK "total exemption full accounts" provide minimal detail. A ~$730K raise in Sep 2023 should be nearly exhausted by March 2026 even at minimal burn. The Jan 2025 share allotment may represent bridge funding.

Compute constraints: Gorman reveals they investigated matching OpenAI's compute but found it infeasible: "we put a ridiculous amount of time and energy dragging up every piece of free and paid and back-room intel on how much compute was used to train GPT-3 and GPT-4."

Business model: Currently unclear. Canvas appears to be the intended revenue path -- a hardware+software consumer product -- but no pricing, ship date, or hardware specs are public. The company has no visible government contracts, lab partnerships, or consulting revenue.

Incentive analysis: As a for-profit with mission-lock, the company needs to generate revenue or raise more capital. The pivot from alignment research (hard to monetize) to consumer product (addressable market exists) is consistent with survival pressure. The risk: consumer product demands diverge from alignment research priorities.

What Others Say

External criticism is essentially absent. No alignment researcher has published a focused critique of Aligned AI's approach, concept extrapolation's viability, or the company's commercial strategy. This reflects the company's small size rather than the quality of its work -- it is below the radar of most commentators.

The alignment research community has not engaged deeply. No Open Phil funding, no mentions in major alignment roundups, limited citations. Armstrong went from being a widely-cited FHI researcher to running a company whose alignment work the community largely ignores.

Gorman's "state of AI" views are themselves controversial. Her claim that scaling inference "can't work" because errors compound oversimplifies -- it ignores verification steps, error correction, and the distinction between serial and parallel chains. "Glorified autocomplete" is reductive about transformer capabilities. A thoughtful critic would note that dismissing the entire LLM paradigm conveniently aligns with having failed to raise compute-scale funding.

The pivot can be read charitably or uncharitably. Charitably: Gorman genuinely assessed the AI landscape, concluded near-term harms matter more, and built a product to address them. Uncharitably: the company couldn't compete in alignment research, couldn't raise significant funding, and pivoted to a consumer market where small teams can compete.

Fortune coverage (2023) positioned the company as a credible small player: "These small startups are making headway on A.I.'s biggest challenges."

What's Absent

No concept extrapolation applied beyond toy benchmarks (image classification, video games) to real-world systems after 4 years.
No publications in top ML venues despite Armstrong's prior academic standing.
No visible product revenue or customer base from any product.
No engagement from major alignment organizations with the company's research.
Angel investor identities undisclosed.
EA Forum launch post and AMA (Feb/Mar 2022) inaccessible -- likely contain the most detailed original theory of change.
No articulation of how Canvas connects to alignment research expertise.
No evidence Armstrong is continuing to advance concept extrapolation theory post-2023.

Stated Theory of Change

Original (2022-2023): Concept extrapolation solves the core alignment problem. AIs that can correctly generalize human concepts and values across changing environments will remain aligned even as their capabilities grow. Build ACE as the practical implementation, demonstrate on benchmarks, then apply to real-world systems. Armstrong's mechanism: "If the goals change as the world model changes, then you are naturally solving part of the alignment problem."

Current (2025-2026): Near-term AI harms are the priority because AGI via LLMs is impossible and the chatbot industry is collapsing. Build Canvas as a consumer product that embodies alignment principles: non-chatbot AI, offline operation, human agency over algorithmic engagement, prevention of parasocial relationships. The company's alignment expertise informs product design rather than producing alignment research for the field.

Revealed Theory of Change

Actions tell a different story from either stated narrative:

The research output has shifted from fundamental to applied. The 2022-2023 papers (concept extrapolation, CoinRun, bias measurement) were advancing an alignment research agenda. The 2025-2026 papers (DATDP jailbreak defense, AI Chaperones, emergent misalignment analysis) are practical safety engineering. This isn't bad work -- DATDP is genuinely useful -- but it represents a retreat from the founding ambition of solving the "very center" of alignment.
The team composition has shifted toward product. Hiring a social anthropologist (Jocelyn Kelley) for market research signals consumer product priorities. No visible alignment researcher hires.
The blog has become a commentary outlet. Recent posts (Grok, LLM pain, chess) are analysis/opinion pieces about other companies' failures rather than advancing the company's own research program.
Canvas displaces alignment research as the organizational focus. The homepage now leads with Canvas. The Wikimedia partnership, the market research, the product design -- these are the activities consuming the team's limited resources.

The revealed theory of change is: Aligned AI is becoming a consumer tech company that applies safety-first design principles, informed by but no longer advancing alignment research.

Key Assumptions

Assumption 1: Concept extrapolation is central to alignment.

Evidence for: Armstrong's theoretical arguments are sophisticated; the CoinRun result demonstrates the basic mechanism works; analogies to physics and moral philosophy are compelling.
Evidence against: The alignment field has not engaged with this claim. RLHF, constitutional AI, and scalable oversight approaches have attracted far more research attention. Concept extrapolation may be important but not uniquely central.
Testable? Partially -- you could test whether ACE-like algorithms help with increasingly complex alignment benchmarks. Not yet tested.
If wrong: The founding theory of change is a dead end, and the pivot to near-term safety products was correct for the wrong reason.

Assumption 2: LLMs cannot achieve AGI and the chatbot industry will collapse.

Evidence for: Gorman's arguments about scaling law exhaustion have partial support from 2024 reporting. Her probability-compounding argument against scaling inference has logical structure.
Evidence against: Reasoning models (o1, o3, DeepSeek) showed significant capability improvements post-scaling-law concerns. "Glorified autocomplete" is reductive about emergent capabilities. "AI winter" predictions have been wrong many times before.
Testable? Yes, within 2-3 years.
If wrong: The Canvas pivot is based on a misdiagnosis, and the company has abandoned alignment research just as AI systems become more capable and harder to align.

Assumption 3: A hardware+software offline consumer ecosystem can compete commercially.

Evidence for: Real parental anxiety about children's technology use. Wikimedia partnership provides credible content. Product philosophy is genuinely distinctive (offline-first, non-chatbot AI).
Evidence against: Hardware+software consumer products require significant capital, supply chain, and distribution -- none of which a 2-10 person company has demonstrated. Competing with Apple, Google, and Amazon in family tech is extraordinarily difficult.
Testable? Yes, by Canvas adoption within 12-18 months of launch.
If wrong: The company burns through remaining capital without achieving product-market fit.

Strengths

Armstrong's intellectual depth is genuine. His pre-founding work (safely interruptible agents, corrigibility, low-impact AIs) is foundational to the field. The concept extrapolation theory, even if not central to alignment, contributes genuinely novel ideas about how values fragment and recombine. His 10% self-assessment of direct success probability is rare intellectual honesty.
The practical safety tools are real. DATDP (jailbreak defense), AI Chaperones (parasocial prevention), GPT-Eliezer (prompt evaluation), faAIr (bias measurement) -- these are concrete, open-sourced contributions that others can build on. The CoinRun result, while modest, is a legitimate benchmark improvement.
The Canvas product philosophy is distinctive. Non-chatbot AI, offline-first, friction-by-design, human agency over algorithmic engagement -- this is a genuinely different approach to consumer AI that could serve as a demonstration of alignment-informed product design.
Mission-lock commitment. As a benefit corporation with a charter-level mission clause, the company has at least stated its intent to avoid pure commercial drift. This is more structural commitment than most startups make.
Policy engagement is real. Gorman's EU/UN/OECD advisory work and the EU AI Act manipulation papers contribute to governance, not just research.

Weaknesses and Risks

The pivot looks like a retreat disguised as strategy. Moving from "solving the very center of alignment" to "offline family AI ecosystem" is a massive scope reduction. The narrative that "chatbots are dying so we're building something better" is convenient but untested. The most parsimonious explanation combines genuine belief-change with resource constraints: they couldn't raise alignment-research-scale funding, and near-term consumer products are more fundable.
The alignment research community has not engaged. Zero CG/Open Phil funding, no top-venue publications, limited citations, no engagement from major alignment labs. This is not definitive proof the work is wrong, but the revealed-preference signal from the ecosystem that funds and evaluates alignment research is strongly negative.
Financial viability is uncertain. ~$730K raised over 4 years with no visible revenue. Canvas is a hardware+software product requiring manufacturing, supply chain, and go-to-market -- all capital-intensive. The Jan 2025 share allotment suggests bridge funding. It's unclear how the company survives long enough to ship Canvas.
Gorman's contrarianism may be overdone. "LLMs are glorified autocomplete" was a defensible position in 2023 but looks increasingly strained given o1/o3/DeepSeek reasoning improvements. Dismissing the entire industry while building a product that presumably uses LLM components (Canvas's AI features) creates internal tension. A CEO who sees AI winter coming while raising money for an AI product faces a credibility challenge.
Concept extrapolation has not scaled. After 4 years, demonstrations remain at image classification and video game benchmark level. The gap between "ACE gets 72% on CoinRun" and "safely extrapolate human values for superintelligent AI" is vast, and no progress on closing it has been published since late 2023.
Team is extremely thin. 2-10 people attempting to simultaneously conduct alignment research, build a hardware consumer product, write papers, and advise governments. This is a recipe for doing many things poorly rather than one thing well.

Cross-References

FAR.AI (Adam Gleave): Advisor relationship. Both work on alignment, but FAR.AI has maintained focus on alignment evaluations and has CG/Open Phil funding. Contrast in trajectories is instructive.
MIRI: Armstrong's intellectual ancestry. His "Smarter Than Us" was a MIRI publication. But MIRI's approach (theoretical deconfusion) diverges from Aligned AI's practical/product approach.
Anthropic/DeepMind alignment teams: Armstrong co-authored with Jan Leike (now Anthropic) and Laurent Orseau (DeepMind). These labs have vastly more resources to pursue alignment research. The question of whether a 5-person startup can contribute meaningfully alongside billion-dollar labs is fundamental.
Calm technology movement: Canvas's design philosophy ("calm, intentional technology") echoes the calm technology principles of Amber Case. More of a product design reference than an alignment one.
AI safety startup landscape: Companies like Redwood Research, ARC, and METR have maintained alignment research focus with CG/Open Phil funding. Aligned AI's pivot away from research toward consumer product is unusual in this cohort.

What Would Change This Assessment

Canvas ships and gains traction. If Canvas actually reaches market and demonstrates that alignment-informed product design resonates with consumers, the pivot narrative becomes "correct strategic foresight" rather than "retreat."
Concept extrapolation publishes at a top venue. A NeurIPS or ICML paper showing ACE-style methods working on harder benchmarks or real-world systems would validate the research program.
CG/Open Phil funds them. Would signal that the alignment funding ecosystem sees value in the approach.
A major alignment researcher engages with concept extrapolation. Even a critical engagement would indicate the idea is taken seriously.
Gorman's AI winter prediction proves correct. If the LLM industry contracts significantly, her positioning looks prescient.
Canvas fails and the company returns to alignment research. Would suggest the pivot was a survival strategy, not a genuine belief change.

Self-Critique

I may be unfairly harsh on the pivot. My assessment pattern-matches "alignment startup pivots to consumer product" to "retreat," but there's a genuine case that near-term harms (parasocial relationships, cognitive deskilling in children) are important and neglected, and that Canvas addresses real needs.
I lack the EA Forum launch post and AMA. These would likely contain the most detailed articulation of the original theory of change and the community's initial reception. Without them, I'm reconstructing the early narrative from secondary sources.
I'm uncertain about financial health. UK company accounts provide almost no information. The company may have secured more funding than visible, or may have consulting/contract revenue not publicly disclosed.
My assessment of Gorman's contrarian views is from within the AI-capability-is-real mainstream. If she's right that LLMs can't achieve AGI and an AI winter is coming, my entire framing is wrong.
Weakest claim: My claim that "the alignment research community has not engaged" is based on absence of evidence (no citations, no funding, no public discussion) which could simply reflect the company being too small, not the work being low-quality. Armstrong's pre-founding work was excellent; there's no reason to assume his company work is inferior simply because it's less cited.
What would most change my view: Access to the EA Forum launch post and AMA, or a conversation with Armstrong about whether concept extrapolation research continues privately. Evidence that Canvas has a realistic path to market would also significantly update my assessment.

Connected to (6)

MATScollaborator · Rebecca Gorman FAR.AIadvisor at · Adam Gleave Future of Humanity Institutestaff from · Stuart Armstrong Google DeepMindcollaborator · Laurent Orseau

Wikimedia Foundationcollaborator

Machine Intelligence Research Institutecollaborator · Stuart Armstrong

Sources (59)

Every URL that was read during research.

1.Aligned AIbuildaligned.ai
2.A little U.K. AI startup has made big progress on the CoinRun AI safety challenge | Fortunefortune.com
3.Meet Rebecca Gorman, whose company, Aligned AI, is trying to match up human values with machine learning | Fortunefortune.com
4.Dr Stuart Armstrong, co-founder of Aligned AIenspire.ox.ac.uk
5.Helping A.I. understand the world like a humanpreseednow.com
6.Why AI’s Fail - Foresight Instituteforesight.org
7.CoinRun: Solving Goal Misgeneralisationarxiv.org
8.Aligned AI / Blogbuildaligned.ai
9.Aligned AI uses Enterprise for family-friendly Ethical and Safe AIenterprise.wikimedia.com
10.Future of Humanity Institute - Wikipediaen.wikipedia.org
11.Rebecca Gorman, co-founder of Aligned AIenspire.ox.ac.uk
12.AI Alignment Podcast: Synthesizing a human's preferences into a utility function with Stuart Armstrong - Future of Life Institutefutureoflife.org
13.Hire Dr. Stuart Armstrong | AI Speaker Agentai-speakers-agency.com
14.Hire Rebecca Gorman | AI Speaker Agentai-speakers-agency.com
15.Stuart Armstrong - Foresight Instituteforesight.org
16.Aligned AI / Companybuildaligned.ai
17.Rebecca Gorman | Speaker | AI Governance, Risk & Ethics Summitvirtual.aiacceleratorinstitute.com
18.Rebecca Gormanoecd.ai
19.18 - Concept Extrapolation with Stuart Armstrongaxrp.net
20.ALIGNED AI LIMITEDfind-and-update.company-information.service.gov.uk
21.Aligning AI, before it's too late, with Rebecca Gorman - London Futuristslondonfuturists.buzzsprout.com
22.Concept Extrapolation: A Conceptual Primerarxiv.org
23.What do we Need to Do to Align AI? – Stuart Armstrongscifuture.org
24.Stuart Armstrong on AI Interpretability, Accidental Misalignment & Risks of Opaque AIscifuture.org
25.ALIGNED AI LIMITEDfind-and-update.company-information.service.gov.uk
26.ALIGNED AI LIMITEDfind-and-update.company-information.service.gov.uk
27.Aligned AI / Blogbuildaligned.ai
28.Aligned AI / Blogbuildaligned.ai
29.Aligned AI / Blogbuildaligned.ai
30.Aligned AI / Blogbuildaligned.ai
31.Aligned AI / Blogbuildaligned.ai
32.Aligned AI / Blogbuildaligned.ai
33.Aligned AI / Blogbuildaligned.ai
34.Aligned AI / Blogbuildaligned.ai
35.Aligned AI / Blogbuildaligned.ai
36.Aligned AI / Blogbuildaligned.ai
37.Aligned AI / Blogbuildaligned.ai
38.Aligned AI / Blogbuildaligned.ai
39.Aligned AI / Blogbuildaligned.ai
40.Aligned AI / Blogbuildaligned.ai
41.These small startups are making headway on A.I.'s biggest challenges | Fortunefortune.com
42.What is Aligned AI / Stuart Armstrong working on?aisafety.info
43.Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluationarxiv.org
44.The dangers in algorithms learning humans' values and irrationalitiesarxiv.org
45.Recognising the importance of preference change: A call for a coordinated multidisciplinary research effort in the age of AIarxiv.org
46.AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbotsarxiv.org
47.Aligned AI / Blogbuildaligned.ai
48.Aligned AI / Blogbuildaligned.ai
49.Aligned AI / Blogbuildaligned.ai
50.Aligned AI / Blogbuildaligned.ai
51.Aligned AI / Blogbuildaligned.ai
52.Canvas by Aligned AIbuildaligned.ai
53.Aligned AI / Mediabuildaligned.ai
54.Aligned AI / Blogbuildaligned.ai
55.Aligned AI / Blogbuildaligned.ai
56.Aligned AI / Blogbuildaligned.ai
57.Stuart Armstrongscholar.google.com
58.Rebecca Gormanscholar.google.com
59.Aligned AI / Blogbuildaligned.ai