← AI Safety Orgs

Haize Labs

Evals/Testing

Automated red-teaming. YC-backed.

Founded
2023
HQ
New York, NY
Team
12
Structure
C-corp
Model
Vc Investment

Theory of Change

Haize Labs' stated theory of change rests on two premises: (1) frontier AI labs cannot be trusted to evaluate their own models because of misaligned incentives, and (2) third-party automated red-teaming can fill this gap at scale.

Leonard Tang, co-founder and CEO: "One of the reasons we started the company was because we felt that you could not take the frontier labs at their face value, when they said that they were the best and safest model. Somebody needs to be a third-party red teamer, third-party tester, third-party evaluator for models, that was totally divorced from any sort of conflict of interest."

Tang frames safety as "trust -- ensuring AI systems perform reliably as intended" and argues that existing safety approaches are "too heavy-handed and too general," advocating instead for context-specific, domain-customized safety testing. The company's tagline has evolved from "Bad day to be a language model" (launch, June 2024) to "Deploy 99.9% Reliable AI" (current), reflecting a shift from adversarial safety testing toward enterprise reliability.

The broader commercial strategy, articulated by an external analyst, is to "use safety testing as the entry point to work with large enterprises and open the door to the real opportunity -- becoming the ultimate LLM reliability platform."

What They Do

Red-teaming frontier models. Red-teamed OpenAI o1 (one month, August-September 2024, automated pipelines) and Anthropic Constitutional Classifiers prototypes (estimated 3,000+ hours across Haize, Gray Swan, and UK AISI). Tang confirmed being paid by OpenAI for this work.

Original attack research. Developed ACG (38x speedup over GCG for adversarial attacks), bijection learning (ICLR; 86.3% attack success rate on Claude 3.5 Sonnet; grows STRONGER with model scale), Cascade (automated multi-turn jailbreaking), string compositions, and DSPy red-teaming (44% attack success rate, 4x baseline).

Evaluation tools. Created Verdict (open-source judge-time compute scaling; GPT-4o+Verdict beats GPT-4o by 14.5%) and the Red-Teaming Resistance Leaderboard with HuggingFace.

Enterprise products. Suite includes Judges (evaluation), Haize (red-teaming), Monitor (production monitoring), Robustify (remediation), and Sphynx (hallucination testing).

The inverse scaling finding is their most scientifically significant result. Bijection learning demonstrates that more capable models are MORE vulnerable to this attack class, not less. Tang: "The bigger and more capable the model is, the more vulnerable it is to our bijection learning attack." This directly challenges the narrative that scaling capabilities improves safety.

Key People

Leonard Tang (Co-Founder, CEO, ~24 years old). Harvard CS+Math '23. Turned down Stanford PhD. 1,004 Google Scholar citations. Internships at NVIDIA, Snap, Cubist Systematic Strategies (quant research). Co-authored with Dan Hendrycks and worked loosely with Andy Zou (Gray Swan). Forbes 30 Under 30 AI (2025). Strong research pedigree but no prior startup or management experience. Public statements frame safety as reliability/trust rather than existential risk.

Co-founders Steve Li (Harvard, BAIR researcher, Roblox/Instagram) and Richard Liu (Harvard; almost no public information) round out the founding team. Nimit Kalra (research scientist, former quant, co-author on Verdict and Constitutional Classifiers papers) departed and is now working independently. Professor He He (NYU) joined as advisor in October 2025.

Team size: ~10-14 employees in New York City (74 Broad Street, Lower Manhattan).

Money and Incentives

Total funding: $12.5M seed round. Initial ~$7M closed August 2024, led by General Catalyst, extended to $12.5M by December 2025. Post-money valuation: $100M, achieved seven months after founding. "Investors reportedly competing on price."

Legal structure: For-profit C-Corp. Zero philanthropic grants. Zero Coefficient Giving funding. No 990 data. Purely VC-funded. Legal obligation is to maximize shareholder value, not to pursue a safety mission.

Revenue: Multi-million dollar contracts with frontier labs (OpenAI, Anthropic, Scale AI, AI21 Labs) and enterprises (Deloitte, MongoDB). No specific revenue figures disclosed.

Investors: General Catalyst, Soma Capital, AI Grant, Stage2 Capital, Pear VC, and others. Angel investors include founders/CEOs of Replit, Cognition (Devin), Pika, HuggingFace, and Okta. Several angels run AI companies that could be Haize customers, creating undisclosed conflicts.

Incentive misalignment (critical): Haize claims to be "totally divorced from any sort of conflict of interest," but is paid by the labs it evaluates. Tang confirmed Haize was paid by OpenAI for o1 red-teaming. Tang also admitted: "We actually have no idea what our testing led into on the OpenAI side. For all we know, we could have just been a final eval dataset... or our data could have been directly baked into the post-training process. We actually have no idea." The labs control what findings are disclosed publicly (via NDAs) and whether findings influence deployment decisions. Haize has no mechanism to compel action on its findings.

Business trajectory: The company appears to be pivoting from pure red-teaming (safety-focused) toward a broader enterprise reliability platform (commercially focused). This follows the standard startup pattern of landing with a differentiated product then expanding into a larger market.

What Others Say

"Red-Teaming for GenAI: Silver Bullet or Security Theater?" (2024, academic survey): "Prior methods and practices of AI red-teaming diverge along several axes... gestures towards red-teaming as a panacea for every possible risk verge on security theater." Most critically: in no case analyzed did red-teaming result in a decision not to release a model. The paper does not name Haize but the critique applies directly to their field.

"AI red-teaming is a sociotechnical problem" (CACM): Argues commercial red teams face structural conflicts of interest, NDA constraints limit public disclosure, and the comparison to content moderation's well-documented problems is instructive. Notes that when labor is outsourced, "it becomes harder to trace and trickier to assert labor protections for it."

TechPolicy Press: Questions whether red-teaming "genuinely interrogates system risk or primarily fulfills externally imposed accountability rituals."

AngelsRound (bull case): "Addressing a rapidly growing market with regulatory tailwinds. Strong early revenue with multiple high-value contracts."

AngelsRound (bear case): "Small team, reliance on narrow set of high-value contracts, competing against AI companies that might develop in-house solutions."

Nathan Labenz (Cognitive Revolution): "I think we're headed in [the direction of truly independent testing], ultimately, and probably for good reason" -- implicitly acknowledging that current commercial red-teaming, including Haize, is not truly independent.

No prominent external voices have directly criticized Haize Labs as a specific company. All field-level criticism applies to them structurally.

What's Absent

  • Tang's personal views on existential risk. His "Crystallizing AI Risk" blog post exists but is inaccessible (JS-rendered site). All public statements frame safety as "trust" and "reliability," never as existential risk or catastrophic risk reduction.
  • Zero engagement with the AI safety research community. No EA Forum, LessWrong, or Alignment Forum presence. No Coefficient Giving funding. Not part of the AI safety ecosystem in any traditional sense.
  • No whistleblower mechanism or dangerous-capability disclosure protocol. Tang was asked directly and acknowledged the gap.
  • No governance documentation. No board composition, conflict of interest policy, or recusal procedures publicly available.
  • No evidence their findings have ever changed a deployment decision. Tang admits they don't know how their findings are used.
  • No policy engagement. No lobbying, testimony, or regulatory participation despite operating in a space where regulation would benefit them.
  • Richard Liu (co-founder) has almost no public presence despite being a Forbes 30 Under 30 honoree.

Recommended Reading

  1. Cognitive Revolution Podcast: "Red Teaming o1" (Sep 2024) -- The most candid source. Tang, Ewart, and Huang discuss founding motivation, attack methods, o1 red-teaming experience, evaluation limitations, and the inverse scaling finding. Key moment: Tang admits they have no idea how their findings are used. https://www.cognitiverevolution.ai/red-teaming-o1-part-1-2-automated-jailbreaking-w-haize-labs-leonard-tang-aidan-ewart-brian-huang/

  2. "Red-Teaming for GenAI: Silver Bullet or Security Theater?" (arXiv, 2024) -- The strongest intellectual challenge to Haize's entire field. Finds red-teaming definitions are vague, methods diverge wildly, and no case resulted in a decision not to release a model. https://arxiv.org/html/2401.15897v3

  3. Sherwood News: "The crash test dummies for new AI models" (Oct 2024) -- Best critical journalism on the field. Surfaces the independence paradox directly from Tang's own quotes. https://sherwood.news/tech/ai-regulation-red-teaming-model-safety-checks/

  4. "Endless Jailbreaks with Bijection Learning" (ICLR) -- Haize's most important paper. The finding that bigger models are more vulnerable has deep implications. https://arxiv.org/html/2410.01294v1

Show Claude’s analysis
An opinionated read. Read the brief first to form your own view.

Stated Theory of Change

Haize's stated theory of change is:

  1. Frontier AI labs have misaligned incentives -- speed to market trumps safety testing.
  2. Internal red-teaming is insufficient because labs can't objectively evaluate their own models.
  3. An independent, automated third-party red-teaming company can fill this gap.
  4. By discovering vulnerabilities before deployment, Haize reduces the harm from AI systems in the real world.
  5. The mechanism: scalable automated adversarial testing reveals failure modes that labs miss or choose not to prioritize.

The causal chain from Haize's work to AI risk reduction runs: Haize discovers vulnerabilities --> labs receive vulnerability reports --> labs fix vulnerabilities --> models are safer when deployed --> real-world harm is reduced.

Revealed Theory of Change

The actions reveal a different, commercially pragmatic theory:

Red-teaming as market entry, reliability as the destination. The tagline evolution ("Bad day to be a language model" to "Deploy 99.9% Reliable AI"), the product suite expansion (from red-teaming to Judges, Monitor, Robustify, Sphynx), and the enterprise client base (Deloitte, MongoDB alongside Anthropic, OpenAI) all point in the same direction. Safety testing is the entry wedge; the real business is becoming the enterprise AI reliability/QA platform.

Research as marketing. The bijection learning paper (ICLR), the HuggingFace leaderboard, and the Constitutional Classifiers collaboration serve dual purposes: they advance genuine scientific knowledge AND they demonstrate technical credibility to potential clients and investors. The choice to publish offensive research openly (rather than keeping it proprietary) suggests the reputational value exceeds the competitive risk.

Independence is qualified. Despite Tang's claims of being "totally divorced from conflict of interest," the company is paid by the labs it tests, bound by NDAs, and admits it doesn't know whether its findings influence deployment decisions. The independence is commercial (they're a vendor), not structural (they're not an independent regulator or auditor).

Near-term focus. All published work addresses adversarial robustness, jailbreaking, and evaluation accuracy. No work on alignment, interpretability beyond attacks, governance frameworks, or long-term safety challenges. The research agenda is tightly focused on problems that have immediate commercial applications.

The revealed theory is not necessarily worse than the stated theory -- it may be more honest. A commercially sustainable company that incrementally improves AI safety testing is plausibly better than a safety organization that runs out of funding. But it means Haize's contribution to AI risk reduction is bounded by what's commercially viable, not by what's most impactful.

Key Assumptions

Assumption 1: Automated red-teaming finds vulnerabilities that matter.

  • Evidence for: ACG, bijection learning, Cascade, and the o1/CC collaborations all demonstrate real vulnerabilities. The inverse scaling finding is a genuine contribution.
  • Evidence against: The "security theater" paper argues red-teaming has never blocked a model release. Haize admits they don't know how findings are used. The most dangerous vulnerabilities (CBRN uplift) are tested by domain experts, not by Haize's automated methods.
  • Testable: Yes -- track whether Haize-discovered vulnerabilities are fixed in subsequent model releases.
  • If wrong: Haize provides expensive reassurance rather than genuine risk reduction.

Assumption 2: Third-party commercial testing is sufficiently independent.

  • Evidence for: Haize has no equity in the labs it tests and no formal obligation to produce favorable results.
  • Evidence against: Haize is paid by the labs, bound by NDAs, doesn't know how findings are used, has no whistleblower mechanism, and its angel investors include founders of potential customer companies.
  • Testable: Would Haize publicly criticize a major client? Would they refuse to test a model they believed was too dangerous to release?
  • If wrong: Haize is a paid vendor that provides a veneer of independent oversight, not genuine accountability.

Assumption 3: The enterprise reliability market will sustain a safety-oriented company.

  • Evidence for: Regulatory tailwinds (EU AI Act requires adversarial testing), growing enterprise AI adoption, Haize's early multi-million dollar contracts.
  • Evidence against: Labs may build superior in-house testing. Competitors (Robust Intelligence, Lakera, Patronus AI) crowd the market. Enterprise reliability may pull Haize away from hard safety problems toward easier QA problems.
  • Testable: Monitor product mix over time -- if safety testing revenue shrinks relative to reliability/QA revenue, the mission drift is real.
  • If wrong: Haize becomes a generic AI testing company that happens to have started with a safety narrative.

Assumption 4: The cat-and-mouse dynamic between attacks and defenses is net positive for safety.

  • Evidence for: Constitutional Classifiers (which Haize helped test) reduced jailbreak success from 86% to 4.4%. This is a concrete defensive improvement.
  • Evidence against: Bijection learning demonstrates that scaling creates new attack surfaces. Published attacks (bijection learning is open-source) also empower malicious actors. The attack surface is exponentially large.
  • Testable: Track whether the fraction of harmful outputs in deployed models decreases over time despite increasing capabilities.
  • If wrong: Automated red-teaming accelerates the arms race without net safety improvement.

Strengths

Genuine technical capability. The team produces real research -- bijection learning at ICLR, the ACG speedup, Cascade, Verdict. These are not marketing claims; they're peer-reviewed contributions that others cite and build on.

The inverse scaling finding is genuinely important. Demonstrating that more capable models are more vulnerable to bijection learning directly challenges the "capabilities-safety correlation" narrative. This finding, if it generalizes, has deep implications for AI safety strategy.

Commercial sustainability. Unlike nonprofit safety orgs dependent on a few philanthropic funders, Haize has a viable business model. Multi-million dollar contracts and a $100M valuation suggest market validation. Commercial sustainability means they can operate independently of the goodwill of any single funder.

Practical impact on defenses. The Constitutional Classifiers collaboration produced a concrete 95%+ reduction in jailbreak success. This is a measurable defensive contribution.

Low compute dependency. Unlike frontier labs or even some safety research orgs, Haize doesn't depend on gifted compute. Their work requires API access, not H100 clusters. This preserves some independence.

Young, fast, research-oriented. The founding team (all ~24, all Harvard CS, multiple with ML research backgrounds) has published 15+ papers despite the company being under 2 years old. The research-first culture, if maintained, is a strength.

Weaknesses and Risks

The conflict of interest is structural, not incidental. Being paid by the companies you evaluate creates an incentive to produce results that maintain the relationship. No amount of good intentions eliminates this structural problem. True independence requires funding from a third party (government, consortium, or philanthropy). Haize chose the VC path, which trades independence for sustainability.

No mechanism for action on findings. Tang admits they don't know how their findings are used. They have no whistleblower protocol, no disclosure mechanism, and no power to delay or prevent deployment. At best, they provide information; at worst, they provide cover. The decision to act on findings rests entirely with the labs.

The "security theater" critique is not addressed. If red-teaming has never blocked a model release (per the academic survey), what evidence is there that Haize's work changes outcomes? The Constitutional Classifiers collaboration shows defensive improvement, but Haize is a testing company, not a defense company. They find problems; they don't control whether problems are fixed.

Mission drift is already visible. The pivot from "Bad day to be a language model" to "Deploy 99.9% Reliable AI" is a move from safety toward enterprise reliability. The product suite expansion (Monitor, Robustify, Sphynx) extends beyond red-teaming into general AI ops. The enterprise client base (Deloitte, MongoDB) has different needs than frontier labs. As commercial pressures increase, the safety mission may become secondary.

Zero engagement with the safety research community. No forum posts, no Coefficient Giving grants, no involvement with EA/rationalist/safety networks. This means Haize is not receiving feedback from or contributing to the intellectual community most focused on AI catastrophic risk. They may be solving the wrong problems at scale.

NDAs limit impact. The most important findings are the ones they can't talk about. NDAs covering vulnerability specifics mean the public (and other researchers) cannot learn from Haize's most significant work.

Tiny team, concentrated risk. ~10-14 employees for a $100M company means enormous key-person risk. If Tang left or the founding team fractured, the company could collapse.

Cross-References

METR and Apollo Research are the closest comparison. Both provide third-party model evaluation, but both are nonprofits with different incentive structures. METR focuses on dangerous capability evaluations (CBRN, autonomy); Apollo focuses on deception and scheming. Haize focuses on automated adversarial robustness. They are complementary rather than competing -- Haize does breadth testing, METR/Apollo do depth testing on the most dangerous capabilities.

Gray Swan (Andy Zou, Zico Kolter) is the closest competitor. They build adversarially robust models rather than testing others' models. Haize tested against Gray Swan models and benchmarks against them. Tang and Zou collaborated at Harvard. There is both competition and collaboration.

Robust Intelligence (acquired for ~$100M in 2023) is the commercial competitor Haize is most often compared to. They focus on AI security rather than red-teaming specifically.

UK AISI and NIST represent the government-led evaluation model that could either complement or displace commercial red-teaming. Tang has been skeptical of NIST's concreteness. Haize does not appear to have direct government partnerships.

Anthropic is both client and collaborator. The Constitutional Classifiers work is Haize's highest-profile contribution. The relationship is symbiotic but asymmetric -- Anthropic controls what is published and how findings are used.

What Would Change This Assessment

  • Evidence that Haize findings delayed or prevented a model release. This would validate the theory of change concretely.
  • Adoption of a whistleblower/disclosure protocol. This would address the governance gap.
  • Tang speaking candidly about existential risk. If the "Crystallizing AI Risk" blog post reveals genuine concern about catastrophic AI risk, it would suggest the commercial framing is strategic rather than reflecting the founders' actual motivations.
  • Transition to a government or consortium funding model. If Haize secured funding from a source other than VCs and client contracts, the independence problem would be partially addressed.
  • Evidence of mission drift -- if safety testing revenue drops below 50% of total revenue, the reliability pivot has won.
  • A high-profile failure -- if a model Haize tested causes significant harm after deployment, it would test whether Haize had been thorough or providing cover.

Self-Critique

What sources should I have checked but didn't?

  • Tang's "Crystallizing AI Risk" blog post (inaccessible, JS-rendered). This is the biggest gap -- it could significantly change the assessment of his motivations.
  • The VentureBeat founding article (rate-limited). An early profile might reveal initial positioning different from current messaging.
  • Haize's actual website and blog (both blocked). The official self-description is not in the evidence base.
  • The AI Security Podcast full transcript (partially read). The commercial positioning details may be more revealing than the catalog summary suggests.

Where is this analysis potentially biased?

  • I may be applying too harsh a standard for "independence." Most safety evaluators have some relationship with the entities they evaluate. The question is whether Haize's commercial relationship is worse than, say, METR's grant-funded relationship with labs that provide model access.
  • I may be undervaluing the near-term contributions. Reducing jailbreak success from 86% to 4.4% (Constitutional Classifiers) is concretely useful even if it doesn't address existential risk.
  • The absence of EA/LW presence is being read as a negative signal, but Haize could be legitimately pursuing safety through a commercial path that simply doesn't intersect with those communities.

What would a thoughtful person who disagrees say?

  • "Haize is doing something that actually works commercially, which makes it sustainable. Nonprofit safety orgs burn through grants and produce papers nobody reads. Haize produces tools that labs actually use. That's more impactful than forum posts."
  • "The independence problem is overstated. Auditing firms are paid by the companies they audit, and this model works well enough in finance. What matters is reputational skin in the game."
  • "The inverse scaling finding alone justifies Haize's existence. This is a genuinely important safety result that came from a for-profit red-teaming company."

What's my single weakest claim? The claim that Haize's work may be "security theater" is the weakest, because the Constitutional Classifiers collaboration demonstrates concrete defensive value. The security theater critique applies more to the field's overall self-presentation than to Haize's specific technical contributions.

What information would most change my view? Evidence that Haize's findings have directly caused a lab to delay or modify a model release. This would validate the entire theory of change.

Connected to (10)

Allen Institute for AIstaff from · Leonard Tang
Anthropicevaluates
HuggingFacecollaborator
UK AI Safety Institutecollaborator
Apollo Researchcollaborator
Gray Swancollaborator · Andy Zou
METRcollaborator
OpenAIevaluates
Scale AIevaluates
Center for AI Safetystaff from · Dan Hendrycks
Sources (37)
Every URL that was read during research.
  1. 1.Harvard in Tech AI Edition: Leonard Tang, co-founder and CEO of Haize Labsmedium.com
  2. 2.Haize Labs 🛡️angelsround.com
  3. 3.Startups to Join: Haize Labsrayansmalik.substack.com
  4. 4.Company Spotlight: Haize Labsdevcuration.substack.com
  5. 5.Introducing the Red-Teaming Resistance Leaderboardhuggingface.co
  6. 6.How Haize Labs Automates LLM Red-Teaming (by Devansh, ML Made Simple)machine-learning-made-simple.medium.com
  7. 7.Red Teaming o1 Part 1/2–Automated Jailbreaking w/ Haize Labs' Leonard Tang, Aidan Ewart& Brian Huangcognitiverevolution.ai
  8. 8.AI Red Teaming & Securing Enterprise AI with Leonard Tang of Haize Labsaisecuritypodcast.com
  9. 9.Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teamingarxiv.org
  10. 10.Constitutional Classifiers: Defending against universal jailbreaksanthropic.com
  11. 11.Red-Teaming for Generative AI: Silver Bullet or Security Theater?arxiv.org
  12. 12.What OpenAI's Latest Red-Teaming Challenge Reveals About the Evolution of AI 'Safety' Practicestechpolicy.press
  13. 13.GitHub - haizelabs/get-haized: A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.github.com
  14. 14.Endless Jailbreaks with Bijection Learningarxiv.org
  15. 15.Endless Jailbreaks with Bijection Learningarxiv.org
  16. 16.GitHub - haizelabs/bijection-learninggithub.com
  17. 17.Haize Labsgithub.com
  18. 18.Plentiful Jailbreaks with String Compositionsarxiv.org
  19. 19.GitHub - haizelabs/dspy-redteam: Red-Teaming Language Models with DSPygithub.com
  20. 20.General Catalyst leads funding round, valuing promising AI safety startup Haize Labs at $100 million - NextRound.ainextround.ai
  21. 21.Haize Labs Closes $12.5M Seed Round for AI Trust & Safetytrysignalbase.com
  22. 22.Verdict: A Library for Scaling Judge-Time Computearxiv.org
  23. 23.GitHub - haizelabs/verdict: Inference-time scaling for LLMs-as-a-judge.github.com
  24. 24.Next-generation Constitutional Classifiers: More efficient protection against universal jailbreaksanthropic.com
  25. 25.Leonard Tang — Grokipediagrokipedia.com
  26. 26.A Safe Harbor for AI Evaluation and Red Teamingarxiv.org
  27. 27.AI red-teaming is a sociotechnical problem: on values, labor, and harmsarxiv.org
  28. 28.Nimit Kalra CVnimit.io
  29. 29.Leonard Tangscholar.google.com
  30. 30.Where we are on for-profit AI safety | Apart Researchapartresearch.com
  31. 31.Security for AI: The New Wave of Startups Racing to Secure the AI Stack | Menlo Venturesmenlovc.com
  32. 32.The 23-year Old Founder Quietly Making Your AI Safe And Reliable - Leonard Tang, CEO Haize Labspods.media
  33. 33.The 9 Best AI Red Teaming Software Tools in 2026 (Ranked & Reviewed)ourcodeworld.com
  34. 34.The crash test dummies for new AI modelssherwood.news
  35. 35.OpenAI’s Approach to External Red Teaming for AI Models and Systemsarxiv.org
  36. 36.Unknownbayesteezian.net
  37. 37.Haize Labsjob-boards.greenhouse.io