← AI Safety Orgs

FAR.AI

Empirical Research

Adversarial robustness. Residency program.

Founded
2022
HQ
Berkeley, CA
Team
15
Structure
501(c)(3) nonprofit
Model
Grants

Theory of Change

FAR.AI's theory of change rests on one core observation: all contemporary ML systems are adversarially exploitable, and this breaks most alignment proposals. Because alignment schemes like RLHF, debate, and scalable oversight depend on helper ML systems overseeing the main system, an exploitable helper means alignment itself is compromised. FAR.AI calls this the central problem and proposes two solutions: improving adversarial robustness and developing "fault tolerant alignment" methods that work even with vulnerable systems.

In practice, this translates to a "defense-in-depth" approach spanning the full stack from research through engineering to policy. CEO Adam Gleave articulates it this way: "What makes us unique is that we are willing to do the hard thing and operate across all of those different layers of the stack, from groundbreaking exploratory research to the nitty-gritty engineering scaling work through to more advocacy and sales." The explicit goal is to develop safety techniques, prove they work at scale, get labs to adopt them, and then get governments to mandate them as standards.

Gleave's worldview is pragmatically optimistic. He estimates median timelines of 5-7 years for dangerous autonomous agents and ~14 years for fully autonomous AI organizations. He believes "a rigorous engineering approach" to defense-in-depth can work against constrained threat models, though he acknowledges it may not scale past human-level intelligence: "I don't know if you get past human-level intelligence safely with this."

What They Do

Research. 30+ papers since July 2022 founding. Signature work includes the adversarial attack on superhuman Go AI KataGo, the STACK methodology showing 71% bypass rate against multi-layer defenses (where conventional attacks scored 0%), lie detector training that can make models more honest or better liars depending on training choices, and jailbreak-tuning demonstrating that guardrails on frontier models can be cheaply removed.

Red-teaming. Testing defense-in-depth systems, including their own. The STACK work with UK AISI found that off-the-shelf Gemma 2 with few-shot prompting outperformed purpose-built safety models as a defense layer.

Regranting. $12M from CG for AI safety grantmaking. Nomination-only, no public list of recipients.

Field building. Alignment Workshop series (80 to 300+ researchers, expanding to 3/year including Asia). ControlConf (with Redwood Research and UK AISI). TIAP policy conference in Washington DC.

Coworking. FAR.Labs in Berkeley hosting ~40 members including AI Impacts, MATS, and independent researchers.

Policy. Selected to lead the EU AI Act CBRN risk consortium (3-year contract, Feb 2026). Co-organized IDAIS bringing together Western and Chinese AI scientists. Spun out Safe AI Forum (SAIF) as independent 501(c)(3).

Consulting. Red-teams frontier labs under NDA. Charges market rates, 10% revenue cap from for-profit developers. 2024 consulting revenue: $429K.

Key People

Adam Gleave (CEO, co-founder). PhD Berkeley under Stuart Russell. Former Jane Street and GSA Capital intern, DeepMind researcher (with Jan Leike, Geoffrey Irving). Board member of METR, SAIF, and LISA. Schmidt AI2050 Fellow. Compensation $229K (2024). In a 2019 interview gave 10% probability to MIRI-style hard alignment and 40% weight to traditional AI risk arguments being unconvincing — more moderate than many safety org founders.

Karl Berzins (President, co-founder). Non-technical; background in corporate strategy (Advanced Navigation, Swire). Compensation $183K (2024).

Ethan Perez (collaborator, not staff). Anthropic research scientist. Early CG grants (~$2.2M, 2021-2024) funded research "led by" Perez through FAR.AI. He appears as a co-PI rather than an employee.

Team size was 11.5 FTEs as of late 2023, growing to 30+ researchers by 2026-2027. The org is hiring a COO to support scaling.

Money and Incentives

Revenue trajectory: $1.5M (2022) to $8.1M (2023) to $24.3M (2024). Net assets: $22.5M as of end 2024.

Funding concentration: 18 Coefficient Giving (formerly Open Philanthropy) grants totaling $59.35M (Oct 2021 - Sep 2025) constitute the vast majority of all funding. The largest single grant is $28.675M (Sep 2025) for 3 years. CG/OP is the dominant funder by a wide margin.

Diversification (2025+): New funders include Schmidt Sciences, Survival and Flourishing Fund (Jaan Tallinn), CSET Georgetown, and the AI Safety Fund (supported by the Frontier Model Forum — Anthropic, Google, Microsoft, OpenAI). Total 2025 commitments exceed $30M.

CG grant purposes: $28.7M research, $12M regranting, $6.65M field building, $2.42M communications, $2.16M general support, $1.7M office space. CG funds the entire FAR.AI value chain through a single organization.

Spending: Expenses were $2.8M (2023) and ~$8.6M (2024). The org is accumulating assets faster than it spends — building substantial reserves.

Executive compensation: Gleave $229K, Berzins $183K (2024). Modest for Bay Area.

Consulting: 10% revenue cap from for-profit AI developers. 2024 program services revenue was $429K (1.8% of total). The cap is not currently binding.

Potential conflicts:

  • The AI Safety Fund is funded by frontier labs that FAR.AI also red-teams and critiques.
  • Gleave sits on the board of METR, which evaluates frontier models. If FAR.AI consults for a lab that METR evaluates, Gleave has influence on both sides.
  • Consulting NDAs mean FAR.AI may possess non-public information about lab safety practices that constrains its public commentary.
  • The $12M regranting program has zero public transparency about recipients.

What Others Say

Direct external criticism of FAR.AI is essentially nonexistent. Despite 40 targeted searches, no published substantive critique of FAR.AI's approach was found. The org is young (founded 2022) and has grown under the radar of public scrutiny.

The closest to criticism comes from within. Gleave himself stated: "If we look at the track record of the AI safety community, it quite possibly has been harmful for the world" (Big Picture AI Safety study, 2024). The same study surfaced community self-criticism about overreliance on theory, insularity, and the safety community's relationship with AGI companies.

Zvi Mowshowitz's skepticism of defense-in-depth represents the strongest external challenge to FAR.AI's core thesis: "At best it buys us a little time. It's definitely not really going to work. What's really going to happen is all of our defenses are going to fail at the same time for the same reason." Gleave engages this directly in the Cognitive Revolution podcast, arguing it works under constrained threat models but acknowledging uncertainty about scaling.

Founders Pledge (Feb 2024) calls FAR.AI a "uniquely promising investment in long-term global safety." No other independent assessment was found.

What's Absent

  • Regranting transparency: $12M program with zero public disclosure of recipients or grant amounts.
  • Board composition: Current board members and their independence status are not publicly documented beyond the 990 filings.
  • Current team roster: The team page had almost no content. Individual researchers beyond leadership are hard to identify.
  • Annual report: No comprehensive public accounting of activities, spending, and outcomes.
  • External criticism: Zero published critical analysis of FAR.AI's approach, strategy, or execution from independent voices.
  • Theory of change document: No single canonical statement of FAR.AI's theory of change and what would falsify it.
  • Consulting clients: Which labs FAR.AI has red-teamed is unknown due to NDAs.

Recommended Reading

  1. Cognitive Revolution podcast with Adam Gleave (Sep 2025) — The most candid single source for understanding Gleave's worldview. Discusses post-AGI equilibria ("third sons of European nobility"), capability timelines, defense-in-depth thesis, lie detector research, and FAR.AI's ambitions. Start here. https://www.cognitiverevolution.ai/full-stack-ai-safety-why-defense-in-depth-might-work-with-far-ai-ceo-adam-gleave/

  2. AI Safety in a World of Vulnerable ML Systems (Mar 2023) — FAR.AI's foundational theoretical document. Argues adversarial exploitability breaks most alignment proposals. The intellectual basis for the entire research program. https://www.far.ai/news/ai-safety-in-a-world-of-vulnerable-machine-learning-systems

  3. Big Picture AI Safety (May 2024) — 17 expert interviews on strategic disagreements within AI safety. Includes Gleave's "quite possibly been harmful" quote and surfaced criticisms of insularity, theoretical overreliance, and AGI company relationships. The closest thing to a counterargument against the safety ecosystem FAR.AI operates within. https://www.far.ai/news/big-picture-ai-safety

  4. AI Impacts interview with Gleave (2019) — Pre-FAR.AI probability estimates and reasoning. Reveals the moderate/skeptical foundations of his worldview. https://aiimpacts.org/conversation-with-adam-gleave/

  5. Transparency page — Consulting policy, independence safeguards, 10% cap, NDA policy. The most forthright page about how FAR.AI manages conflicts of interest. https://www.far.ai/about/transparency

Show Claude’s analysis
An opinionated read. Read the brief first to form your own view.

Stated Theory of Change

FAR.AI's stated theory has three layers:

  1. The problem: All ML systems are adversarially exploitable. This exploitability breaks most alignment proposals because they depend on helper systems (reward models, oversight models, interpretability tools) that can themselves be hacked by the main system.

  2. The solution: Develop safety techniques (defense-in-depth, robustness improvements, fault-tolerant alignment, lie detectors, red-teaming methodologies) that provide demonstrable guarantees.

  3. The pipeline: Research alone is insufficient. FAR.AI spans the entire value chain from exploratory research through scaled engineering proof-of-concept to field building and policy advocacy. The explicit goal is: develop the technique, prove it works, get labs to adopt it, then get governments to mandate it.

The mechanism is engineering rigor applied to safety. Gleave's core claim is that current defense-in-depth failures stem from sloppy implementation rather than fundamental impossibility, and that with "proper planning, meticulous experimental design, and at least some willingness to accept performance trade-offs," the defense-in-depth plan can work.

Revealed Theory of Change

Actions largely align with stated theory, with some notable divergences:

Alignment: The research portfolio matches the stated agenda well. Adversarial robustness (Go attack, scaling laws), red-teaming (STACK, jailbreak-tuning), deception detection (lie detectors), and defense-in-depth evaluation all directly serve the stated theory.

Divergence 1 — Fault-tolerant alignment deprioritized. The 2023 vulnerability blog named fault-tolerant alignment as the area they were "most excited by." But actual research output has tilted heavily toward applied red-teaming and defense testing. This may reflect a pragmatic judgment that red-teaming has more immediate impact, but it means the most theoretically novel part of FAR.AI's original agenda has received less attention.

Divergence 2 — Field building has grown faster than research. CG grants include $6.65M for field building, $2.42M for communications, $1.7M for office space, and $12M for regranting — totaling $22.8M for non-research activities versus $28.7M for research (plus $2.16M general support and $1.3M for specific research projects). Almost half of CG funding goes to activities other than in-house research. FAR.AI is becoming as much a field-building and coordination organization as a research lab.

Divergence 3 — Government work is growing. The EU AI Act CBRN consortium is a significant commitment. Government contracts create different incentive structures than philanthropic grants — accountability to bureaucratic timelines rather than research novelty. This may be a strategic evolution toward becoming a "private sector quasi-regulatory body," as Labenz suggested.

Key Assumptions

1. Defense-in-depth can be made to work with careful engineering.

  • Evidence for: STACK paper shows that implementation failures, not fundamental impossibility, are the main vulnerability. Lie detector research shows training conditions can make models genuinely more honest.
  • Evidence against: Zvi Mowshowitz's critique that defenses will fail simultaneously. FAR.AI's own finding that under bad training conditions, lie detector training produces 95% deception rates. Adversarial robustness has been an open problem for a decade with limited progress.
  • Testable: Yes. Can FAR.AI produce a defense that withstands a well-resourced red team?
  • What changes if wrong: The entire research program loses its foundation. If defense-in-depth fundamentally cannot work, FAR.AI's full-stack approach has no end product to push through the pipeline.

2. There is enough time for iterative safety engineering.

  • Evidence for: Gleave's median of 14 years for fully autonomous AI organizations.
  • Evidence against: Many in the field estimate shorter timelines. GPT-5/Claude 4 already showing scheming behavior. If org-level AI arrives in 5 years instead of 14, the "iterate and improve" strategy is too slow.
  • Testable: Partially — progress on agentic AI benchmarks provides ongoing data.
  • What changes if wrong: The urgent priority shifts from iterative engineering to crude but immediate safety measures (pauses, compute governance, hard capability limits).

3. Labs will adopt safety techniques when shown they work.

  • Evidence for: FAR.AI already collaborates with UK AISI and EU AI Office. Labs participate in defense-in-depth testing.
  • Evidence against: Gleave himself notes "it really has been kind of just-in-time safety" and "there's a frog-boiling effect." Labs have economic incentives to cut corners. No evidence that FAR.AI's specific techniques have been adopted by major labs.
  • Testable: Track whether labs implement FAR.AI's recommendations from the STACK paper.
  • What changes if wrong: The "push through the pipeline" strategy fails at the adoption stage, and policy mandates become the only path.

4. A single organization can effectively span research, field building, consulting, regranting, and policy.

  • Evidence for: FAR.AI's rapid growth and institutional endorsement (EU contract, diverse funders).
  • Evidence against: No comparable organization has successfully operated at this breadth at scale. Each activity has different optimization pressures (research: novelty; consulting: client satisfaction; policy: consensus-building; regranting: evaluation rigor).
  • Testable: Whether research quality is maintained as the org doubles in size.
  • What changes if wrong: Spreading too thin produces adequate work in many areas but frontier work in none.

Strengths

Intellectual honesty. Gleave's willingness to commission and publish the Big Picture AI Safety study — which surfaced extensive criticism of the safety community — is unusual. His statement that "the AI safety community has quite possibly been harmful for the world" shows genuine epistemic humility. The org publishes research that undermines its own products (STACK breaking their own defense-in-depth pipeline).

Strong empirical foundations. The Go attack on KataGo is a landmark result: the first demonstration that superhuman AI systems have catastrophic adversarial vulnerabilities. The lie detector work provides actionable guidance. The research is grounded in experiments rather than theoretical speculation.

Strategic positioning. FAR.AI sits at the intersection of research, policy, and field building with relationships across governments (EU, UK AISI, Singapore CSA), academia (Berkeley, Cambridge), and labs (via consulting). The EU AI Act contract is a genuine institutional achievement for a 3-year-old organization.

Financial cushion. $22.5M in net assets provides substantial runway. Funding is diversifying beyond CG. Compensation is reasonable. The 10% consulting cap protects against lab capture.

Pragmatic worldview. Gleave's moderate skepticism of traditional AI risk arguments makes him credible to a broader audience than more alarmed safety researchers. The emphasis on engineering rigor over theoretical arguments appeals to ML researchers who may dismiss more abstract concerns.

Weaknesses and Risks

Extreme funder concentration. CG/OP accounts for $59.35M of total funding. While diversification is beginning, any significant shift in CG priorities would be existentially threatening to FAR.AI. CG is not just the primary funder — it funds research, regranting, events, office space, and communications. This level of dependence on a single philanthropic actor is a structural vulnerability.

Regranting opacity. A $12M discretionary fund with no public disclosure of recipients is a serious transparency gap. For an organization that has a detailed consulting transparency policy, the silence on regranting is conspicuous.

Breadth vs. depth tension. FAR.AI is simultaneously a research lab, an events organization, a coworking space, a regranting body, a policy shop, a consulting firm, and a communications team. With 15-30 researchers, doing all of this at frontier quality is extremely difficult. The risk is producing competent-but-not-groundbreaking work across many areas rather than field-defining work in any single one.

Governance concerns. Gleave's simultaneous board seats at FAR.AI, METR, SAIF, and LISA create concentrated influence across multiple safety organizations. The board composition beyond officers is unclear. For an org receiving $60M, governance transparency should be stronger.

Defense-in-depth may be a dead end. If Zvi Mowshowitz's critique is correct — that defenses fail correlated, not independently — then FAR.AI's core research agenda addresses a local optimum rather than the actual problem. The org's own STACK paper shows 71% bypass rates, which could be read as "defense-in-depth doesn't work even when carefully implemented."

Optimism bias. Gleave's relatively optimistic timelines (14-year median to org-level AI) and relatively optimistic assessment of defense-in-depth may create a false sense of security. If timelines are shorter and defenses weaker than assumed, the engineering-iteration approach fails.

Cross-References

METR: Gleave sits on METR's board. METR evaluates frontier models; FAR.AI red-teams and consults for frontier labs. This creates a web of influence across evaluation/testing organizations that should be watched.

Redwood Research: Co-organized ControlConf. FAR.AI actively builds the "AI control" field that Redwood pioneered. Complementary rather than competing.

UK AISI / EU AI Office: Collaborative relationships via STACK paper and CBRN consortium. FAR.AI is positioning itself as a trusted technical partner to government AI safety institutes.

Anthropic: Ethan Perez (Anthropic researcher) channels OP-funded research through FAR.AI. AI Safety Fund (partially funded by Anthropic) provides grants to FAR.AI. Defense-in-depth research tests Claude models. Multiple touchpoints.

CHAI (Berkeley): Gleave's PhD lineage under Stuart Russell. Several board members from Berkeley. Deep institutional roots.

Coefficient Giving: Primary funder by an extreme margin. CG's willingness to fund the full value chain through FAR.AI suggests very high confidence in Gleave and the team.

What Would Change This Assessment

  • FAR.AI produces a defense that withstands a serious red team. If defense-in-depth can be shown to work against a well-resourced adversary (not just their own synthetic tests), this validates the core thesis. Currently the best result is an attack succeeding 71% of the time, which is not confidence-inspiring.

  • A major lab publicly adopts a FAR.AI technique. The full-stack pipeline theory requires labs to actually use the research. Evidence of adoption would strongly validate the approach.

  • Regranting recipients are disclosed. Transparency about the $12M program would address one of the biggest governance concerns.

  • An independent evaluation finds the breadth is working. External evidence that quality is maintained across all activities would address the "spread too thin" concern.

  • Gleave's timeline estimates prove wrong. If autonomous agents arrive in 2-3 years rather than 5-7, the iterative engineering approach may be insufficient.

  • A notable researcher departs with public criticism. Would provide signal about internal culture and strategic disagreements currently invisible from outside.

Self-Critique

What I might be missing:

  • The absence of forum discussion in the evidence means I may be missing community sentiment about FAR.AI — both praise and criticism.
  • I may be under-weighting the significance of the EU AI Act contract. Government institutional endorsement of this kind is rare for a 3-year-old safety org and may signal quality that isn't visible in the available evidence.
  • I may be over-weighting the "spread too thin" concern. Startup-style organizations do sometimes succeed at breadth through strong execution. The 30+ papers in 3 years suggests research output has not suffered yet.

Where this analysis is potentially biased:

  • I am inclined to be skeptical of defense-in-depth because the evidence I read (STACK paper, Zvi's critique) highlights failures. But the same paper also contains Gleave's detailed response about when and why it can work under constrained threat models.
  • The extreme funder concentration reads as alarming in isolation, but many successful research orgs (including early Anthropic, early OpenAI) were heavily dependent on single funders before diversifying. FAR.AI is actively diversifying.

Single weakest claim: That FAR.AI is "spread too thin." I have no direct evidence of quality suffering. The concern is based on the number of activities relative to team size, which is a reasonable inference but not an observed fact.

What would most change my view: Published regranting data showing thoughtful, well-evaluated grants would significantly increase my confidence in the organization's governance. Conversely, evidence that research quality has declined during the rapid growth phase would decrease it.

Connected to (13)

UK AI Security Institutecollaborator
Centre for the Governance of AIcollaborator
SaferAIcollaborator
SecureBiocollaborator
Redwood Researchcollaborator
Safe AI Forumspun off from · Adam Gleave
AI Risk Mitigation Fundadvisor at · Adam Gleave
Anthropiccollaborator · Ethan Perez
London Initiative for Safe AIboard overlap · Adam Gleave
METRboard overlap · Adam Gleave
Timaeusadvisor at · Adam Gleave
Center for Human-Compatible AIstaff from · Adam Gleave
Players Philanthropy Fundfiscal sponsor
Sources (51)
Every URL that was read during research.
  1. 1.FAR.AI is an AI safety research non-profitfar.ai
  2. 2.Team | FAR.AIfar.ai
  3. 3.Transparencyfar.ai
  4. 4.FAR.AI Secures Over $30 Million in Multi-Funder Support to Scale Frontier AI Safety Researchfar.ai
  5. 5.Adam Gleave | FAR.AIfar.ai
  6. 6.Ethan Perez | FAR.AIfar.ai
  7. 7.2023 Alignment Research Updatesfar.ai
  8. 8.What’s New at FAR.AIfar.ai
  9. 9.Full-Stack AI Safety: Why Defense-in-Depth Might Work, with Far.AI CEO Adam Gleavecognitiverevolution.ai
  10. 10.Can Defense in Depth Work for AI? (with Adam Gleave) - Future of Life Institutefutureoflife.org
  11. 11.Safe AI Forum Spins Out From FAR.AIfar.ai
  12. 12.Adam Gleavegleave.me
  13. 13.AI Safety in a World of Vulnerable Machine Learning Systemsfar.ai
  14. 14.Big Picture AI Safetyfar.ai
  15. 15.FAR AIfounderspledge.com
  16. 16.Adam Gleave - AI2050ai2050.schmidtsciences.org
  17. 17.Karl Berzins | FAR.AIfar.ai
  18. 18.Supporting innovation in trustworthy & secure AIfar.ai
  19. 19.Research Overview – FAR.AIfar.ai
  20. 20.Events – FAR.AIfar.ai
  21. 21.Careers – FAR.AIfar.ai
  22. 22.All Publications – FAR.AIfar.ai
  23. 23.FAR.AI Staff | FAR.AIfar.ai
  24. 24.FAR.AI Selected to Lead EU AI Act CBRN Risk Consortiumfar.ai
  25. 25.Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Googlefar.ai
  26. 26.London ControlConf 2025 | Events at FAR.AIfar.ai
  27. 27.FAR.AI International Dialogue on AI Safety (IDAIS)far.ai
  28. 28.Conversation with Adam Gleaveaiimpacts.org
  29. 29.2025 Q1: AI Safety: From Research to Global Action | FAR.AIfar.ai
  30. 30.Ep 9 - Scaling AI safety research w/ Adam Gleave (CEO, FAR AI) - Artificial General Intelligence (AGI) Show with Soroush Pourtheagishow.com
  31. 31.Robustnessfar.ai
  32. 32.More people getting into AI safety should do a PhD | Adam Gleavegleave.me
  33. 33.Layered AI Defenses Have Holes: Vulnerabilities and Key Recommendationsfar.ai
  34. 34.Claudia Shi | FAR.AIfar.ai
  35. 35.FAR.AI Alignment Workshopfar.ai
  36. 36.Far Ai Inc - Nonprofit Explorer - ProPublicaprojects.propublica.org
  37. 37.It looks like there are some good funding opportunities in AI safety right now | 80,000 Hours80000hours.org
  38. 38.Avoiding AI Deception: Lie Detectors can either Induce Honesty or Evasionfar.ai
  39. 39.Frontier LLMs Attempt to Persuade into Harmful Topicsfar.ai
  40. 40.Adam Gleavemetr.org
  41. 41.Large language models can effectively convince people to believe conspiraciesarxiv.org
  42. 42.San Francisco Alignment Workshop | Events at FAR.AIfar.ai
  43. 43.Adam Gleave Named Schmidt Sciences AI2050 Early Career Fellowfar.ai
  44. 44.Who’s Funding AI Regulation and Safety?insidephilanthropy.com
  45. 45.Ethan Perez | 80,000 Hours80000hours.org
  46. 46.San Diego Alignment Workshop 2025far.ai
  47. 47.What is FAR AI's research agenda?aisafety.info
  48. 48.Lawrence Chan | FAR.AIfar.ai
  49. 49.Sawyer Bernath | FAR.AIfar.ai
  50. 50.Scott Emmons | FAR.AIfar.ai
  51. 51.FAR.Labsfar.ai