Theory of Change
Gray Swan positions itself as "the safety and security provider for the AI era," focused on enabling enterprises to "deploy AI with confidence." The company was founded by CMU faculty and researchers who "pioneered efforts to identify vulnerabilities in large language models."
The company's own description of its mission: "We believe that ensuring that AI systems can be used safely and securely will prove to be the defining challenge in their widespread adoption in enterprise." (Launch announcement, July 2024)
The theory of change, as articulated through products and public statements, is: (1) Gray Swan discovers AI vulnerabilities through research and Arena competitions, (2) this threat intelligence feeds into defensive tools (Cygnal filtering, Shade testing), (3) enterprises deploy AI securely using these tools, (4) secure deployment makes AI adoption safer. The causal chain runs from vulnerability research to commercial products to enterprise deployment security.
Notably, Gray Swan's public materials contain no language about existential risk, catastrophic AI risk, or AI alignment. The framing is entirely commercial cybersecurity. This contrasts sharply with co-founder Dan Hendrycks' personal view that AI catastrophe is "more likely than not" and co-founder Zico Kolter's philosophical concerns about intelligence emerging "separate from living things."
What They Do
Products. Cygnal: real-time AI input/output filtering with custom policy enforcement ("99.98% attack block rate" per their product page, unverified independently). Shade: automated red-teaming and continuous security testing. AI Red-Teaming Services: expert manual assessments and custom Arena events.
The Arena. Gray Swan's most distinctive asset. Over 12,000 active red-teamers compete in bounty challenges. The UK AISI Agent Red-Teaming Challenge (March-April 2025) was the largest public evaluation of agentic LLM safety to date: 1.8 million attack attempts, 62,000 breaches, $171,800 in prizes, 2,000+ participants, co-judged by UK AISI and US AISI, sponsored by OpenAI, Anthropic, and Google DeepMind. Attack success rates ranged from 1.5% to 6.5% across anonymized models. The Arena also functions as a hiring pipeline (10+ contractors, multiple full-time hires, including VP Nick Winter).
Research. Foundational publications: GCG (first automated LLM jailbreaking method, 400+ citations, 2023), Circuit Breakers (NeurIPS 2024), Representation Engineering (RepE), HarmBench (adopted by US and UK AI Safety Institutes), WMDP (biosecurity/cybersecurity benchmark), Safety Pretraining (NeurIPS 2025, open-source 1.7B SafeLM), ARTEMIS (AI pen-tester outperforming 9/10 human experts, Dec 2025). Research conducted jointly with CMU and CAIS. nanoGCG open-sourced (305 GitHub stars, 73 forks).
Product evolution. At launch (July 2024), Gray Swan offered Cygnet -- a Llama-3-based LLM engineered for safety -- and Shade evaluation tools. By 2026, Cygnet became Cygnal (a filtering middleware), the Arena became central, and the company hired a CSO from cybersecurity (Tanium) to drive enterprise sales. The pivot from "building safe models" to "securing any model" is commercially significant.
Key People
Matt Fredrikson (CEO): Associate Professor, CMU CyLab. PhD UW-Madison 2015. Research: security, privacy, formal methods. USENIX Test of Time Award 2024. The quietest founder -- almost no public interviews or media presence despite being CEO.
Zico Kolter (Chief Scientist/CTA): Professor and ML Department Head at CMU. Also: OpenAI Board member and Chair of OpenAI's Safety & Security Committee (can halt model releases), Chief Expert at Bosch Center for AI, Schmidt Sciences AI Safety grantee. His simultaneous roles at OpenAI and Gray Swan constitute the company's most significant undisclosed governance issue.
Andy Zou (CTO): CMU PhD (advisors: Kolter, Fredrikson). Thesis defense February 2026. First author of GCG, RepE, Circuit Breakers. His PhD research is literally the technical foundation of Gray Swan's products.
Dan Hendrycks (unpaid advisor, former co-founder): Executive Director of CAIS. Divested all Gray Swan equity in July 2024 after Pirate Wires exposed the conflict between CAIS's co-sponsorship of SB-1047 and his Gray Swan co-founding.
Team size: ~29 employees. All three active founders maintain dual CMU/Gray Swan positions.
Money and Incentives
Funding. $5.68M seed from Juniper Ventures and Lionheart Ventures. No Series A announced despite being 2+ years old with 29 employees. This is notably underfunded compared to direct competitors: XBOW ($120M), Armadin ($190M), RunSybil ($40M), Haize Labs ($12.5M at $100M valuation).
Revenue model. Three streams: (1) SaaS (Cygnal filtering, token-based pricing with 50M free tier), (2) enterprise red-teaming services, (3) Arena operations with lab sponsorships. No concrete revenue figures are publicly available.
Government income. One known government contract: ~GBP 129,000 from the UK government for "robust safeguards and offensive cyber capability measurement." UK AISI and US AISI co-judged Arena competitions.
Lab relationships. OpenAI, Anthropic, Google DeepMind, Meta, Amazon, ByteDance, and Deloitte are described as clients or sponsors. The nature of these relationships -- whether they are paying enterprise customers or Arena sponsors -- is not clearly distinguished in Gray Swan's public materials.
Business model trajectory. Launch (2024) pitched safe AI models. Current (2026) pitches enterprise AI security platform. CSO hire (Rob Jenks, ex-Tanium) signals aggressive enterprise go-to-market. The framing shifted from "we build safe AI" to "we secure your AI" -- a standard startup pivot toward a larger addressable market.
Incentive structure. Gray Swan is a for-profit C-Corp (not a PBC). No legal obligation to prioritize safety over returns. The company's commercial success depends on demand for AI safety testing -- which increases with AI regulation, high-profile AI failures, and growing enterprise AI adoption. All three co-founders benefit from the narrative that AI is dangerous and enterprises need protection. This creates a structural incentive to emphasize AI risk (commercially useful) that happens to align with genuine safety concerns.
CAIS-Gray Swan pipeline. Research produced at CAIS (funded by Open Philanthropy, FTX, SFF) was subsequently commercialized at Gray Swan. Shared authors include Hendrycks, Zou, and Mazeika across papers like HarmBench, WMDP, and Circuit Breakers. The IP transfer from nonprofit to for-profit is undocumented.
What Others Say
Circuit breaker bypass (TU Munich, 2024). Researchers demonstrated 100% attack success rate against circuit breaker models using three simple modifications -- different optimizer, semantically meaningful initialization, and multiple generations -- "without conducting any further hyperparameter tuning." This directly challenges Gray Swan's flagship defensive technology. Gray Swan has not publicly responded.
Red-teaming as security theater (2024, NIST-submitted paper). Finds "prior methods and practices of AI red-teaming diverge along several axes" and argues that "gestures towards red-teaming as a panacea for every possible risk verge on security theater." Notes that responses from LLM developers to red-teaming findings have been "muted and generally mixed."
Pirate Wires investigation (July 2024). Documented the sequence: Senator Wiener contacts CAIS -> CAIS creates Action Fund -> Action Fund co-sponsors SB-1047 -> Gray Swan launches during bill negotiations. Asked: "Will Hendrycks financially benefit from a market that he worked with regulators to create?" Forced Hendrycks' divestment within days.
Nick Winter (before joining as VP). Observed "There wasn't a clear trend towards newer models being more secure." On Gray Swan's cygnet models: "I'm impressed by the strength of Gray Swan's defenses, but not as impressed as I'd be if I didn't notice any performance degradation." Noted over-refusal rates. Independent testing by Confirm Labs found 39% over-refusal rate on OR-Bench vs. Gray Swan's published 6% on WildChat.
Competitive positioning. An external analyst (March 2026): "Gray Swan spent years perfecting its 'Arena' model and nurturing thousands of the world's top red-teamers." But the same source notes $350M+ was raised by competitors in just two weeks, dwarfing Gray Swan's $5.68M.
Scott Alexander (ACX) described Hendrycks as having "gotten a reputation for being incorruptible" and framed the conflict criticism as coming from "trolls." The AI safety community largely defended Hendrycks despite the structural conflict.
What's Absent
No public conflict-of-interest policy for Kolter's simultaneous OpenAI board/Safety Committee chair and Gray Swan co-founding roles. No response to the TU Munich paper showing 100% circuit breaker bypass. No concrete revenue figures or customer counts. No Series A despite 2+ years of operation. No long-form interview with any founder specifically about Gray Swan's vision and theory of change. Zero engagement with the AI safety research community (no forum posts, no Coefficient Giving grants). No whistleblower policy or dangerous-capability disclosure protocol for Arena participants. No documentation of the CAIS-to-Gray-Swan IP pipeline. No independent verification of the "99.98% attack block rate" claim. No evidence that Arena-discovered vulnerabilities have caused any lab to delay or modify a model release.
Recommended Reading
Nick Winter's Jailbreaking Championship blog -- The most candid, technically detailed external account of Gray Swan's technology. Written before Winter joined as VP. Covers attack techniques, competition structure, circuit breaker behavior, and honest observations about safety-capability tradeoffs. https://www.nickwinter.net/posts/my-experiences-in-gray-swan-ais-ultimate-jailbreaking-championship
Pirate Wires: "The Conflict of Interest at the Heart of CA's AI Bill" -- Investigative piece that laid out the CAIS/SB-1047/Gray Swan connection and forced Hendrycks' divestment. The strongest critical source. https://web.archive.org/web/2024/https://www.piratewires.com/p/sb-1047-dan-hendrycks-conflict-of-interest
"Revisiting the Robust Alignment of Circuit Breakers" (TU Munich) -- Three simple modifications break Gray Swan's flagship defense with 100% success rate. Essential for assessing technical claims. https://arxiv.org/html/2407.15902v2
"Red-Teaming for Generative AI: Silver Bullet or Security Theater?" -- Academic paper arguing the entire AI red-teaming field may produce reassurance rather than genuine safety improvement. Directly relevant to Gray Swan's value proposition. https://arxiv.org/html/2401.15897v3
WESA: Kolter leads OpenAI safety panel that can halt releases -- Best source on the undisclosed governance tension at the heart of Gray Swan. https://www.wesa.fm/health-science-tech/2025-11-09/carnegie-mellon-zico-kolter-openai-safety