Theory of Change
FAR.AI's theory of change rests on one core observation: all contemporary ML systems are adversarially exploitable, and this breaks most alignment proposals. Because alignment schemes like RLHF, debate, and scalable oversight depend on helper ML systems overseeing the main system, an exploitable helper means alignment itself is compromised. FAR.AI calls this the central problem and proposes two solutions: improving adversarial robustness and developing "fault tolerant alignment" methods that work even with vulnerable systems.
In practice, this translates to a "defense-in-depth" approach spanning the full stack from research through engineering to policy. CEO Adam Gleave articulates it this way: "What makes us unique is that we are willing to do the hard thing and operate across all of those different layers of the stack, from groundbreaking exploratory research to the nitty-gritty engineering scaling work through to more advocacy and sales." The explicit goal is to develop safety techniques, prove they work at scale, get labs to adopt them, and then get governments to mandate them as standards.
Gleave's worldview is pragmatically optimistic. He estimates median timelines of 5-7 years for dangerous autonomous agents and ~14 years for fully autonomous AI organizations. He believes "a rigorous engineering approach" to defense-in-depth can work against constrained threat models, though he acknowledges it may not scale past human-level intelligence: "I don't know if you get past human-level intelligence safely with this."
What They Do
Research. 30+ papers since July 2022 founding. Signature work includes the adversarial attack on superhuman Go AI KataGo, the STACK methodology showing 71% bypass rate against multi-layer defenses (where conventional attacks scored 0%), lie detector training that can make models more honest or better liars depending on training choices, and jailbreak-tuning demonstrating that guardrails on frontier models can be cheaply removed.
Red-teaming. Testing defense-in-depth systems, including their own. The STACK work with UK AISI found that off-the-shelf Gemma 2 with few-shot prompting outperformed purpose-built safety models as a defense layer.
Regranting. $12M from CG for AI safety grantmaking. Nomination-only, no public list of recipients.
Field building. Alignment Workshop series (80 to 300+ researchers, expanding to 3/year including Asia). ControlConf (with Redwood Research and UK AISI). TIAP policy conference in Washington DC.
Coworking. FAR.Labs in Berkeley hosting ~40 members including AI Impacts, MATS, and independent researchers.
Policy. Selected to lead the EU AI Act CBRN risk consortium (3-year contract, Feb 2026). Co-organized IDAIS bringing together Western and Chinese AI scientists. Spun out Safe AI Forum (SAIF) as independent 501(c)(3).
Consulting. Red-teams frontier labs under NDA. Charges market rates, 10% revenue cap from for-profit developers. 2024 consulting revenue: $429K.
Key People
Adam Gleave (CEO, co-founder). PhD Berkeley under Stuart Russell. Former Jane Street and GSA Capital intern, DeepMind researcher (with Jan Leike, Geoffrey Irving). Board member of METR, SAIF, and LISA. Schmidt AI2050 Fellow. Compensation $229K (2024). In a 2019 interview gave 10% probability to MIRI-style hard alignment and 40% weight to traditional AI risk arguments being unconvincing — more moderate than many safety org founders.
Karl Berzins (President, co-founder). Non-technical; background in corporate strategy (Advanced Navigation, Swire). Compensation $183K (2024).
Ethan Perez (collaborator, not staff). Anthropic research scientist. Early CG grants (~$2.2M, 2021-2024) funded research "led by" Perez through FAR.AI. He appears as a co-PI rather than an employee.
Team size was 11.5 FTEs as of late 2023, growing to 30+ researchers by 2026-2027. The org is hiring a COO to support scaling.
Money and Incentives
Revenue trajectory: $1.5M (2022) to $8.1M (2023) to $24.3M (2024). Net assets: $22.5M as of end 2024.
Funding concentration: 18 Coefficient Giving (formerly Open Philanthropy) grants totaling $59.35M (Oct 2021 - Sep 2025) constitute the vast majority of all funding. The largest single grant is $28.675M (Sep 2025) for 3 years. CG/OP is the dominant funder by a wide margin.
Diversification (2025+): New funders include Schmidt Sciences, Survival and Flourishing Fund (Jaan Tallinn), CSET Georgetown, and the AI Safety Fund (supported by the Frontier Model Forum — Anthropic, Google, Microsoft, OpenAI). Total 2025 commitments exceed $30M.
CG grant purposes: $28.7M research, $12M regranting, $6.65M field building, $2.42M communications, $2.16M general support, $1.7M office space. CG funds the entire FAR.AI value chain through a single organization.
Spending: Expenses were $2.8M (2023) and ~$8.6M (2024). The org is accumulating assets faster than it spends — building substantial reserves.
Executive compensation: Gleave $229K, Berzins $183K (2024). Modest for Bay Area.
Consulting: 10% revenue cap from for-profit AI developers. 2024 program services revenue was $429K (1.8% of total). The cap is not currently binding.
Potential conflicts:
- The AI Safety Fund is funded by frontier labs that FAR.AI also red-teams and critiques.
- Gleave sits on the board of METR, which evaluates frontier models. If FAR.AI consults for a lab that METR evaluates, Gleave has influence on both sides.
- Consulting NDAs mean FAR.AI may possess non-public information about lab safety practices that constrains its public commentary.
- The $12M regranting program has zero public transparency about recipients.
What Others Say
Direct external criticism of FAR.AI is essentially nonexistent. Despite 40 targeted searches, no published substantive critique of FAR.AI's approach was found. The org is young (founded 2022) and has grown under the radar of public scrutiny.
The closest to criticism comes from within. Gleave himself stated: "If we look at the track record of the AI safety community, it quite possibly has been harmful for the world" (Big Picture AI Safety study, 2024). The same study surfaced community self-criticism about overreliance on theory, insularity, and the safety community's relationship with AGI companies.
Zvi Mowshowitz's skepticism of defense-in-depth represents the strongest external challenge to FAR.AI's core thesis: "At best it buys us a little time. It's definitely not really going to work. What's really going to happen is all of our defenses are going to fail at the same time for the same reason." Gleave engages this directly in the Cognitive Revolution podcast, arguing it works under constrained threat models but acknowledging uncertainty about scaling.
Founders Pledge (Feb 2024) calls FAR.AI a "uniquely promising investment in long-term global safety." No other independent assessment was found.
What's Absent
- Regranting transparency: $12M program with zero public disclosure of recipients or grant amounts.
- Board composition: Current board members and their independence status are not publicly documented beyond the 990 filings.
- Current team roster: The team page had almost no content. Individual researchers beyond leadership are hard to identify.
- Annual report: No comprehensive public accounting of activities, spending, and outcomes.
- External criticism: Zero published critical analysis of FAR.AI's approach, strategy, or execution from independent voices.
- Theory of change document: No single canonical statement of FAR.AI's theory of change and what would falsify it.
- Consulting clients: Which labs FAR.AI has red-teamed is unknown due to NDAs.
Recommended Reading
Cognitive Revolution podcast with Adam Gleave (Sep 2025) — The most candid single source for understanding Gleave's worldview. Discusses post-AGI equilibria ("third sons of European nobility"), capability timelines, defense-in-depth thesis, lie detector research, and FAR.AI's ambitions. Start here. https://www.cognitiverevolution.ai/full-stack-ai-safety-why-defense-in-depth-might-work-with-far-ai-ceo-adam-gleave/
AI Safety in a World of Vulnerable ML Systems (Mar 2023) — FAR.AI's foundational theoretical document. Argues adversarial exploitability breaks most alignment proposals. The intellectual basis for the entire research program. https://www.far.ai/news/ai-safety-in-a-world-of-vulnerable-machine-learning-systems
Big Picture AI Safety (May 2024) — 17 expert interviews on strategic disagreements within AI safety. Includes Gleave's "quite possibly been harmful" quote and surfaced criticisms of insularity, theoretical overreliance, and AGI company relationships. The closest thing to a counterargument against the safety ecosystem FAR.AI operates within. https://www.far.ai/news/big-picture-ai-safety
AI Impacts interview with Gleave (2019) — Pre-FAR.AI probability estimates and reasoning. Reveals the moderate/skeptical foundations of his worldview. https://aiimpacts.org/conversation-with-adam-gleave/
Transparency page — Consulting policy, independence safeguards, 10% cap, NDA policy. The most forthright page about how FAR.AI manages conflicts of interest. https://www.far.ai/about/transparency