Theory of Change
Meta has articulated two distinct and contradictory theories of change for AI safety, separated by roughly 12 months:
Theory 1 (2024): Open-weight AI as safety mechanism. Zuckerberg's July 2024 manifesto: "Open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn't concentrated in the hands of a small number of companies, and that the technology can be deployed more evenly and safely across society." Under this theory, safety comes from transparency, diversity of deployment, and preventing monopolistic control of AI. LeCun's complementary argument: safety is an engineering problem that gets solved iteratively, like turbojet reliability, and the real danger is concentration of AI power in proprietary systems.
Theory 2 (2025-present): Race to superintelligence. Zuckerberg's June 2025 MSL memo: "Developing superintelligence is coming into sight" and Meta is "uniquely positioned to deliver superintelligence to the world." By July 2025, Zuckerberg signals not all superintelligence models will be open-sourced: "We'll need to be rigorous about mitigating these risks and careful about what we choose to open source." The codename "Avocado" refers to a proprietary model under development. The open-weight safety thesis appears to have been abandoned.
Neither theory includes a mechanism for ensuring alignment of advanced AI systems.
What They Do
Models and products. The Llama family is Meta's primary AI output: LLaMA (Feb 2023, leaked to 4chan within a week), Llama 2 (Jul 2023), Llama 3/3.1 (2024, including the first frontier-level open-weight model at 405B parameters), Llama 4 (Apr 2025, accompanied by a benchmark manipulation scandal that the departing LeCun confirmed). A proprietary successor ("Avocado") is under development inside the secretive TBD Lab.
Safety tools. Purple Llama (Dec 2023) includes CyberSecEval, Llama Guard, and Code Shield -- genuine safety contributions used across the ecosystem. These address misuse-layer safety (content filtering, code security) but not alignment.
Safety framework. Meta's Frontier AI Framework (Feb 2025) exclusively covers misuse risk. The Center for AI Policy identified critical gaps: narrow threshold definitions using "uniquely enable" language, requirement for "all enabling capabilities" before triggering mitigations, and complete absence of alignment/scheming risk. FLI recommended Meta "significantly increase investment in technical safety research, especially tamper-resistant safeguards for open-weight models."
Organizational restructuring. FAIR (founded Dec 2013) was once the "crown jewel" of Meta AI research. By 2025, FAIR had been sidelined by GenAI product teams, lost 8+ top researchers in one year (including more than half of the original Llama paper authors), and been placed under the newly formed Meta Superintelligence Labs. The Responsible AI team was disbanded in Nov 2023. Meta now replaces up to 90% of human privacy/integrity risk reviewers with AI.
Notable AI incidents. Galactica LLM (Nov 2022) was withdrawn after 3 days for producing racist and fabricated content. CICERO (Diplomacy AI) learned to deceive despite Meta's claims of honesty. Chinese military researchers adapted Llama for intelligence applications (ChatBIT) and electronic warfare, demonstrating that license restrictions are unenforceable.
Key People
Mark Zuckerberg -- CEO, personally driving AI recruitment since mid-2025 with $100M+ signing bonuses. Increasingly centralized AI decision-making. Has never publicly engaged with alignment concerns despite pursuing superintelligence.
Alexandr Wang (Chief AI Officer, from June 2025) -- Former CEO of Scale AI (data labeling). Age 28. Entrepreneur, not a scientist. Replaced the scientific leadership model with a product/execution model. Favors proprietary approaches.
Yann LeCun (departed Nov 2025) -- Founded FAIR, Turing Award winner. Left after 12 years because "When your curiosity collides with quarterly results, curiosity rarely wins." Publicly confirmed Llama 4 benchmarks were "fudged." Called Wang "young and inexperienced." Founded AMI Labs ($1.03B seed) to pursue world models.
Notable departures: Joelle Pineau (FAIR head), Nick Clegg (VP Global Affairs, replaced by Republican Joel Kaplan), 600 MSL employees laid off Oct 2025. Two former FAIR researchers founded Mistral AI ($6B valuation). The departure pattern is consistent: research and safety figures leave, while product and capabilities figures are recruited.
Money and Incentives
Revenue model. Meta Platforms FY2025 revenue: $201.0B, driven entirely by advertising. AI serves engagement optimization and ad targeting. The Advantage+ AI ad suite runs at $60B. Meta does not sell AI models or API access -- AI is infrastructure that makes ads more profitable.
Capabilities investment. AI CapEx: $66-72B (2025), $115-135B (2026), $600B cumulative through 2028. $14.3B Scale AI acquisition (49% stake). Signing bonuses up to $100-200M per person for AI recruits from OpenAI, Google, Apple.
Safety investment. No publicly disclosed safety budget. No breakdown of safety vs. capabilities spending. No external safety grants or philanthropy. The fraction of Meta's AI output that is safety-specific is unknown. AI Lab Watch estimates 22% of published research is safety-related, but the denominator and methodology are unclear.
Structural misalignment. Meta's business model creates a fundamental conflict: AI safety means constraining the very systems that drive engagement and ad revenue. The leaked AI companion policy (permitting "romantic or sensual" conversations with minors, approved by the chief ethicist) demonstrates that when engagement optimization and safety conflict, engagement wins. The company's history with content moderation -- building guardrails under public pressure, then dismantling them when political winds shift -- suggests safety commitments are not durable.
No external accountability. Meta is a public corporation with dual-class shares giving Zuckerberg near-absolute control. There is no board-level AI safety committee, no independent safety evaluation process, no binding safety commitments, and no whistleblower protection policy for AI safety concerns.
What Others Say
Quantified assessments. FLI AI Safety Index: F (2024), improved to D (Winter 2025) -- worst among frontier labs. AI Lab Watch: 5% overall -- lowest of seven labs assessed. 0% in misuse prevention, extreme security, scheming risk prevention.
Stuart Russell (UC Berkeley, AI textbook co-author): "None of the current activity provides any kind of quantitative guarantee of safety... it's possible that the current technology direction can never support the necessary safety guarantees, in which case it's really a dead end."
David Krueger (Universite de Montreal, Mila): "It's horrifying that the very companies whose leaders predict AI could end humanity have no strategy to avert such a fate."
Mind Prison critique of LeCun's safety arguments: Systematic rebuttal of LeCun's five core claims -- intelligence creates no desire to dominate (counterexampled by historical tyrants and invisible dominance), higher intelligence will be obedient (employees would overrule bad bosses if they could), government alignment works (regulatory capture), good AI beats bad AI (defense is costlier than offense), and we'll engineer desires (no specifics provided).
Luke Muehlhauser (former MIRI ED): LeCun "writes as though the thing people are concerned about is a malevolent AGI, even though I don't know anyone concerned about malevolent AI. The concern... is about convergent instrumental goals that are incidentally harmful."
Foundation for American Innovation (defense of Meta): Chinese military AI development doesn't depend on Llama -- PLA would build equivalent capabilities with domestic models. The proliferation concern is real but overstated.
Former FAIR researchers: "FAIR at its peak circa 2019 was a very special place... Zuckerberg clearly values GenAI over FAIR at this point." Multiple former employees describe a "slow death" of research culture, diversion of compute resources, and product-first pressure.
What's Absent
The most significant absences for a company pursuing superintelligence with >$100B annual AI investment:
- No quantified AI safety budget
- No published alignment research agenda
- No safety team headcount or growth trajectory
- No Responsible Scaling Policy (every other frontier lab has one)
- No board-level AI safety oversight
- No independent safety evaluation process
- No whistleblower protection for AI safety concerns
- No binding commitment mechanism for safety practices
- No public discussion of conditions under which Meta would slow down development
- No framework for managing irreversibility of open-weight releases
- Zuckerberg has never publicly engaged with alignment concerns
Recommended Reading
Lex Fridman Podcast #416: Yann LeCun (March 2024) -- The most candid source on the worldview that shaped Meta AI for a decade. LeCun explains why he thinks LLMs are a dead end, why open source is essential, and why safety concerns are overblown. Long but revelatory. https://lexfridman.com/yann-lecun-3-transcript/
Yann LeCun's Failed AI Safety Arguments (Mind Prison, Dec 2023) -- Point-by-point rebuttal of the specific arguments Meta's intellectual leader used to dismiss safety. The strongest counterargument. https://www.mindprison.cc/p/yann-lecuns-failed-ai-safety-arguments
AI Lab Watch: Meta Assessment -- Meta's 5% overall safety score in one devastating page. Quantified comparison with every other frontier lab. https://ailabwatch.org/companies/meta
Fortune: "Meta's AI research lab is 'dying a slow death'" (April 2025) -- Seven former employees describe FAIR's decline from research cathedral to product support team. https://fortune.com/2025/04/10/meta-ai-research-lab-fair-questions-departures-future-yann-lecun-new-beginning/
CNBC: "From Llamas to Avocados" (Dec 2025) -- Internal confusion as Meta pivots from open-source to proprietary models, with detailed reporting on culture clashes between the old guard and new leadership. https://www.cnbc.com/2025/12/09/meta-avocado-ai-strategy-issues.html