Gray Swan AI

Research

Hendrycks safety startup. CAIS cross-ref.

Founded: 2024
HQ: Pittsburgh, PA
Team: 29
Structure: C-corp
Model: Mixed

Theory of Change

Gray Swan positions itself as "the safety and security provider for the AI era," focused on enabling enterprises to "deploy AI with confidence." The company was founded by CMU faculty and researchers who "pioneered efforts to identify vulnerabilities in large language models."

The company's own description of its mission: "We believe that ensuring that AI systems can be used safely and securely will prove to be the defining challenge in their widespread adoption in enterprise." (Launch announcement, July 2024)

The theory of change, as articulated through products and public statements, is: (1) Gray Swan discovers AI vulnerabilities through research and Arena competitions, (2) this threat intelligence feeds into defensive tools (Cygnal filtering, Shade testing), (3) enterprises deploy AI securely using these tools, (4) secure deployment makes AI adoption safer. The causal chain runs from vulnerability research to commercial products to enterprise deployment security.

Notably, Gray Swan's public materials contain no language about existential risk, catastrophic AI risk, or AI alignment. The framing is entirely commercial cybersecurity. This contrasts sharply with co-founder Dan Hendrycks' personal view that AI catastrophe is "more likely than not" and co-founder Zico Kolter's philosophical concerns about intelligence emerging "separate from living things."

What They Do

Products. Cygnal: real-time AI input/output filtering with custom policy enforcement ("99.98% attack block rate" per their product page, unverified independently). Shade: automated red-teaming and continuous security testing. AI Red-Teaming Services: expert manual assessments and custom Arena events.

The Arena. Gray Swan's most distinctive asset. Over 12,000 active red-teamers compete in bounty challenges. The UK AISI Agent Red-Teaming Challenge (March-April 2025) was the largest public evaluation of agentic LLM safety to date: 1.8 million attack attempts, 62,000 breaches, $171,800 in prizes, 2,000+ participants, co-judged by UK AISI and US AISI, sponsored by OpenAI, Anthropic, and Google DeepMind. Attack success rates ranged from 1.5% to 6.5% across anonymized models. The Arena also functions as a hiring pipeline (10+ contractors, multiple full-time hires, including VP Nick Winter).

Research. Foundational publications: GCG (first automated LLM jailbreaking method, 400+ citations, 2023), Circuit Breakers (NeurIPS 2024), Representation Engineering (RepE), HarmBench (adopted by US and UK AI Safety Institutes), WMDP (biosecurity/cybersecurity benchmark), Safety Pretraining (NeurIPS 2025, open-source 1.7B SafeLM), ARTEMIS (AI pen-tester outperforming 9/10 human experts, Dec 2025). Research conducted jointly with CMU and CAIS. nanoGCG open-sourced (305 GitHub stars, 73 forks).

Product evolution. At launch (July 2024), Gray Swan offered Cygnet -- a Llama-3-based LLM engineered for safety -- and Shade evaluation tools. By 2026, Cygnet became Cygnal (a filtering middleware), the Arena became central, and the company hired a CSO from cybersecurity (Tanium) to drive enterprise sales. The pivot from "building safe models" to "securing any model" is commercially significant.

Key People

Matt Fredrikson (CEO): Associate Professor, CMU CyLab. PhD UW-Madison 2015. Research: security, privacy, formal methods. USENIX Test of Time Award 2024. The quietest founder -- almost no public interviews or media presence despite being CEO.

Zico Kolter (Chief Scientist/CTA): Professor and ML Department Head at CMU. Also: OpenAI Board member and Chair of OpenAI's Safety & Security Committee (can halt model releases), Chief Expert at Bosch Center for AI, Schmidt Sciences AI Safety grantee. His simultaneous roles at OpenAI and Gray Swan constitute the company's most significant undisclosed governance issue.

Andy Zou (CTO): CMU PhD (advisors: Kolter, Fredrikson). Thesis defense February 2026. First author of GCG, RepE, Circuit Breakers. His PhD research is literally the technical foundation of Gray Swan's products.

Dan Hendrycks (unpaid advisor, former co-founder): Executive Director of CAIS. Divested all Gray Swan equity in July 2024 after Pirate Wires exposed the conflict between CAIS's co-sponsorship of SB-1047 and his Gray Swan co-founding.

Team size: ~29 employees. All three active founders maintain dual CMU/Gray Swan positions.

Money and Incentives

Funding. $5.68M seed from Juniper Ventures and Lionheart Ventures. No Series A announced despite being 2+ years old with 29 employees. This is notably underfunded compared to direct competitors: XBOW ($120M), Armadin ($190M), RunSybil ($40M), Haize Labs ($12.5M at $100M valuation).

Revenue model. Three streams: (1) SaaS (Cygnal filtering, token-based pricing with 50M free tier), (2) enterprise red-teaming services, (3) Arena operations with lab sponsorships. No concrete revenue figures are publicly available.

Government income. One known government contract: ~GBP 129,000 from the UK government for "robust safeguards and offensive cyber capability measurement." UK AISI and US AISI co-judged Arena competitions.

Lab relationships. OpenAI, Anthropic, Google DeepMind, Meta, Amazon, ByteDance, and Deloitte are described as clients or sponsors. The nature of these relationships -- whether they are paying enterprise customers or Arena sponsors -- is not clearly distinguished in Gray Swan's public materials.

Business model trajectory. Launch (2024) pitched safe AI models. Current (2026) pitches enterprise AI security platform. CSO hire (Rob Jenks, ex-Tanium) signals aggressive enterprise go-to-market. The framing shifted from "we build safe AI" to "we secure your AI" -- a standard startup pivot toward a larger addressable market.

Incentive structure. Gray Swan is a for-profit C-Corp (not a PBC). No legal obligation to prioritize safety over returns. The company's commercial success depends on demand for AI safety testing -- which increases with AI regulation, high-profile AI failures, and growing enterprise AI adoption. All three co-founders benefit from the narrative that AI is dangerous and enterprises need protection. This creates a structural incentive to emphasize AI risk (commercially useful) that happens to align with genuine safety concerns.

CAIS-Gray Swan pipeline. Research produced at CAIS (funded by Open Philanthropy, FTX, SFF) was subsequently commercialized at Gray Swan. Shared authors include Hendrycks, Zou, and Mazeika across papers like HarmBench, WMDP, and Circuit Breakers. The IP transfer from nonprofit to for-profit is undocumented.

What Others Say

Circuit breaker bypass (TU Munich, 2024). Researchers demonstrated 100% attack success rate against circuit breaker models using three simple modifications -- different optimizer, semantically meaningful initialization, and multiple generations -- "without conducting any further hyperparameter tuning." This directly challenges Gray Swan's flagship defensive technology. Gray Swan has not publicly responded.

Red-teaming as security theater (2024, NIST-submitted paper). Finds "prior methods and practices of AI red-teaming diverge along several axes" and argues that "gestures towards red-teaming as a panacea for every possible risk verge on security theater." Notes that responses from LLM developers to red-teaming findings have been "muted and generally mixed."

Pirate Wires investigation (July 2024). Documented the sequence: Senator Wiener contacts CAIS -> CAIS creates Action Fund -> Action Fund co-sponsors SB-1047 -> Gray Swan launches during bill negotiations. Asked: "Will Hendrycks financially benefit from a market that he worked with regulators to create?" Forced Hendrycks' divestment within days.

Nick Winter (before joining as VP). Observed "There wasn't a clear trend towards newer models being more secure." On Gray Swan's cygnet models: "I'm impressed by the strength of Gray Swan's defenses, but not as impressed as I'd be if I didn't notice any performance degradation." Noted over-refusal rates. Independent testing by Confirm Labs found 39% over-refusal rate on OR-Bench vs. Gray Swan's published 6% on WildChat.

Competitive positioning. An external analyst (March 2026): "Gray Swan spent years perfecting its 'Arena' model and nurturing thousands of the world's top red-teamers." But the same source notes $350M+ was raised by competitors in just two weeks, dwarfing Gray Swan's $5.68M.

Scott Alexander (ACX) described Hendrycks as having "gotten a reputation for being incorruptible" and framed the conflict criticism as coming from "trolls." The AI safety community largely defended Hendrycks despite the structural conflict.

What's Absent

No public conflict-of-interest policy for Kolter's simultaneous OpenAI board/Safety Committee chair and Gray Swan co-founding roles. No response to the TU Munich paper showing 100% circuit breaker bypass. No concrete revenue figures or customer counts. No Series A despite 2+ years of operation. No long-form interview with any founder specifically about Gray Swan's vision and theory of change. Zero engagement with the AI safety research community (no forum posts, no Coefficient Giving grants). No whistleblower policy or dangerous-capability disclosure protocol for Arena participants. No documentation of the CAIS-to-Gray-Swan IP pipeline. No independent verification of the "99.98% attack block rate" claim. No evidence that Arena-discovered vulnerabilities have caused any lab to delay or modify a model release.

Stated Theory of Change

Gray Swan's stated theory of change is enterprise-focused: AI systems are vulnerable, enterprises need to deploy AI securely, Gray Swan provides the tools (Cygnal filtering, Shade testing) and threat intelligence (Arena competitions) to enable secure deployment. The company positions itself as "the safety and security provider for the AI era."

The causal chain: Gray Swan discovers vulnerabilities through research and 12,000+ Arena red-teamers -> threat intelligence feeds into defensive products -> enterprises deploy these products -> AI deployments become more secure -> reduced harm from AI misuse in production.

This is a cybersecurity theory of change, not an AI safety theory of change. It addresses the question "how do we prevent jailbreaking and misuse of deployed AI systems?" rather than "how do we ensure increasingly powerful AI systems remain aligned with human values?"

Revealed Theory of Change

Gray Swan's actions reveal a more nuanced theory:

Offense drives defense. The company's deepest conviction, evident in the research trajectory (GCG -> HarmBench -> Circuit Breakers -> Arena), is that you can only defend AI systems by first understanding how to attack them. This is the classical cybersecurity "red team" philosophy applied to AI. The Arena makes this philosophy into a business: crowdsource attacks, then sell defenses.

Research is the moat. Gray Swan's actual competitive advantage is not its products but its research lineage. GCG (400+ citations), Circuit Breakers (NeurIPS), HarmBench (adopted by national AI Safety Institutes), RepE, WMDP -- this body of work, produced primarily at CMU and CAIS before Gray Swan existed, gives the company credibility that purely commercial competitors lack. The revealed strategy is: maintain research leadership to attract lab partnerships and government sponsorship, then convert credibility into commercial products.

The Arena is the flywheel. The Arena simultaneously generates: threat intelligence data (1.8M attacks), marketing credibility (UK AISI and US AISI co-judging), a hiring pipeline (10+ contractors, VP-level hires), benchmarking data labs want, and a community of 12,000 red-teamers who provide ongoing attack innovation for free. This is a genuinely clever business construction.

Divergence from stated theory. The stated theory is about enterprise security. The revealed actions show something more hybrid: (a) genuine safety research that advances the field, (b) government partnerships that position Gray Swan close to the regulatory apparatus, and (c) commercial products that pay the bills. The tension is that (a) and (b) point toward a public-good mission while (c) requires commercial optimization.

The CAIS-Gray Swan pipeline. Research funded by philanthropy (Open Philanthropy, FTX) through CAIS, and by public universities (CMU), was subsequently commercialized by Gray Swan. This is a standard academic-to-startup pipeline, but the nonprofit intermediary adds a layer: CAIS's mission was reducing AI risk, and Gray Swan commercializes that work for enterprise customers. Whether this is mission-aligned (commercializing safety research makes it more widely used) or mission-diluting (commercial incentives reshape what gets built) depends on execution.

Key Assumptions

1. Adversarial testing produces genuine safety improvements, not just security theater.

Evidence for: The Constitutional Classifiers collaboration (Haize + Gray Swan + UK AISI + Anthropic) reduced jailbreak success from 86% to 4.4%. Arena data provides concrete comparative ASR metrics. Labs pay for this testing.
Evidence against: The "security theater" paper finds no case where red-teaming prevented a model release. Gray Swan has no mechanism to compel labs to act on findings. Nick Winter (Gray Swan VP) observed "no clear trend towards newer models being more secure."
Testable: Track whether models tested through Gray Swan's Arena show measurably improved safety in subsequent versions.
If wrong: Gray Swan provides expensive reassurance rather than genuine risk reduction -- a dangerous outcome if enterprises believe they are protected when they are not.

2. Circuit Breakers are a viable defense mechanism.

Evidence for: NeurIPS 2024 acceptance. Survived the full month of the Jailbreaking Championship (2 of 3 cygnet models unbroken). Low added latency.
Evidence against: TU Munich showed 100% bypass with three simple modifications and no hyperparameter tuning. Confirm Labs found 39% over-refusal rate on OR-Bench. The safety-helpfulness tradeoff is real and substantial.
Testable: Whether Gray Swan has updated circuit breakers to resist the TU Munich attack (no public evidence as of March 2026).
If wrong: Gray Swan's core technical contribution is a defensive technique that does not survive serious adversarial pressure, and their product claims are overstated.

3. The Arena community model is sustainable and scalable.

Evidence for: 12,000+ participants, $370K+ paid out, government co-judging, lab sponsorship, VP-level hire from Arena.
Evidence against: Arena operations are expensive (prize pools, hosting, moderation). Lab sponsorship may be contingent on continued novelty. Competitors (XBOW, Armadin) have vastly more capital to build competing platforms.
Testable: Whether Arena participation grows and retains top talent against better-funded alternatives.
If wrong: Gray Swan loses its primary data advantage and hiring pipeline.

4. A $5.68M seed is sufficient to compete in a market where competitors raised $120M-$190M.

Evidence for: Gray Swan has a research pedigree and government relationships that money cannot buy quickly. Arena gives them proprietary data. Academic connections reduce labor costs (PhD students, postdocs).
Evidence against: Enterprise sales requires investment in go-to-market, customer success, and product engineering. The CSO hire signals this need. Without more capital, Gray Swan may be outcompeted on execution even with better research.
Testable: Whether a Series A materializes in 2026.
If wrong: Gray Swan is acquired, runs out of money, or becomes a research shop that cannot compete commercially.

Strengths

1. Research pedigree is unmatched among AI security startups. GCG (400+ citations), Circuit Breakers (NeurIPS), HarmBench (adopted by AI Safety Institutes), WMDP, RepE, Safety Pretraining (NeurIPS 2025), ARTEMIS. No competitor has this depth of peer-reviewed AI safety research. This gives Gray Swan unique credibility with both labs and governments.

2. Government relationships are strategically valuable. UK AISI and US AISI co-judging Arena competitions, UK government contract, OpenAI/Anthropic/DeepMind sponsorship. Gray Swan is positioned as the neutral testing ground that governments trust -- a powerful position if AI safety testing becomes mandatory.

3. The Arena is a genuinely novel competitive asset. No other AI security company has built a 12,000-person community of red-teamers generating continuous threat intelligence. The Arena produces data, talent, credibility, and benchmarks simultaneously. It is hard to replicate quickly even with more capital.

4. Dual CMU positions provide talent access. Three founders with active CMU positions can attract PhD students, postdocs, and researchers. The Safety Pretraining paper (10 authors across CMU, CAIS, Gray Swan) shows this pipeline in action.

5. The founders are technically excellent. Zou's publication record (GCG, RepE, Circuit Breakers) is extraordinary for a researcher under 30. Fredrikson's formal methods background adds rigor that pure ML researchers lack. Kolter's stature opens doors at the highest levels.

Weaknesses and Risks

1. The Kolter conflict is a ticking time bomb. Kolter chairs OpenAI's Safety Committee (can halt model releases) while co-founding a company that sells safety testing tools. If this undisclosed conflict becomes public in the way the Hendrycks/SB-1047 conflict did, it could be far more damaging because Kolter holds actual governance power. The absence of a disclosed conflict-of-interest policy makes this worse, not better.

2. Circuit Breakers may be fundamentally limited. The TU Munich 100% bypass, the 39% over-refusal rate on OR-Bench, and the Jailbreaking Championship's observation that safety training degrades helpfulness all point to the same issue: circuit breakers trade capability for safety in a way that may not be acceptable to enterprise customers. Gray Swan's commercial pitch ("innovate at full speed without compromising security") may be overpromising.

3. Severe underfunding relative to competition. $5.68M vs. $120M-$190M for competitors. In a market requiring enterprise sales infrastructure, this gap is dangerous. Gray Swan's research advantage may not matter if competitors ship faster and capture enterprise contracts.

4. No evidence of downstream safety impact. Despite generating 62K breaches in the UK AISI challenge and running continuous Arena operations, there is no documented case of Gray Swan's findings causing a lab to delay, modify, or not release a model. The same structural problem that plagues the entire commercial red-teaming industry applies to Gray Swan.

5. The CAIS-Gray Swan IP pipeline raises integrity questions. Research funded by philanthropy (Open Phil's $12.5M to CAIS, FTX's $6.5M) was commercialized at a for-profit startup co-founded by the same people. This is legal and common in academia, but it creates a perception problem: donors who funded CAIS to reduce AI risk may not have expected the research to become a for-profit product.

6. Zero engagement with the AI safety research community. No forum posts, no community discourse, no philanthropic funding. Gray Swan exists entirely in the cybersecurity/enterprise market. This means they are not accountable to or informed by the intellectual community most focused on catastrophic AI risk.

Cross-References

Gray Swan vs. Haize Labs: The most direct competitor. Both do third-party AI red-teaming. Key differences: Gray Swan has the research pedigree (GCG, Circuit Breakers), Haize has more capital ($12.5M vs. $5.68M) and a stronger enterprise reliability pitch. Gray Swan builds defensive technology (circuit breakers, Cygnal); Haize focuses on offensive testing. Leonard Tang (Haize CEO) collaborated with Andy Zou at Harvard. The inverse scaling finding (Haize) and the circuit breaker technique (Gray Swan) address the same problem from opposite sides.

Gray Swan vs. CAIS: Same founding team, different legal structures. CAIS is the nonprofit producing research and policy advocacy (MAIM, Superintelligence Strategy). Gray Swan is the for-profit commercializing that research. Hendrycks connects both. The question is whether they are complementary (CAIS does research, Gray Swan does products) or whether the for-profit's commercial needs distort the nonprofit's priorities.

Gray Swan vs. XBOW/Armadin/RunSybil: Pure cybersecurity competitors with 20-30x more capital. They focus on general software vulnerabilities; Gray Swan specializes in AI-specific attacks. Gray Swan's research moat is real but may not survive a well-funded competitor building an equivalent platform.

Gray Swan and OpenAI: Kolter chairs OpenAI's Safety Committee AND co-founded Gray Swan. This is not just a cross-reference -- it is a governance entanglement. OpenAI is both Gray Swan's regulator (via Kolter's committee role) and its customer/sponsor (via Arena participation).

What Would Change This Assessment

A disclosed conflict-of-interest policy for Kolter's dual role would significantly reduce governance concerns.
A published response to the TU Munich circuit breaker bypass would clarify whether the core technology has evolved.
A Series A raise would signal commercial viability and investor confidence.
Documented evidence that Arena findings changed a lab's deployment decision would validate the entire theory of change.
An interview where Fredrikson or Zou articulate their personal theory of change would clarify whether Gray Swan is a safety company doing enterprise or an enterprise company doing safety.
Open Phil or SFF investing in Gray Swan would signal alignment with the broader AI safety ecosystem.

Self-Critique

Sources I should have checked but did not: The PGH Tech podcast with founders (audio only, no transcript). The 20MinVC episode with Kolter (audio only). These would likely contain the most candid articulations of the founders' actual vision. Also: OpenAI's formal announcement of Kolter (failed to fetch) might contain conflict-of-interest disclosures.

Potential bias: I may be overweighting the governance concerns because the Pirate Wires article is vivid and well-documented, while the positive governance story (Kolter using his OpenAI position to genuinely improve safety) is by nature private and undocumented. The circuit breaker bypass critique may be overstated -- white-box embedding attacks are an unrealistically strong threat model for production systems (as the TU Munich paper itself acknowledges).

A thoughtful person who disagrees would say: "Gray Swan is doing exactly what the AI safety field needs -- taking academic research and making it commercially viable. The Arena is the best public evaluation of AI model safety that exists. The circuit breaker bypass requires white-box access that real attackers don't have. The Kolter situation is normal dual-role stuff in academia. And being VC-funded is actually better than being grant-funded because it forces discipline and sustainability."

Weakest claim: That the CAIS-Gray Swan IP pipeline is problematic. Academic-to-startup technology transfer is how the US innovation system works. The founders had every right to commercialize their research. The question is whether the nonprofit's donors understood this would happen -- and that question may have an innocent answer.

Information that would most change my view: (1) Evidence that Kolter's Safety Committee has actually used its authority to delay an OpenAI model release, which would demonstrate that the dual role produces genuine safety value. (2) A candid interview with Fredrikson or Zou explaining their theory of change for AI risk reduction, which would clarify whether the enterprise framing is strategic or reflects their actual beliefs.

Connected to (13)

Bosch Center for AIadvisor at · Zico Kolter Schmidt Sciencescollaborator · Zico Kolter

Anthropiccollaborator

Google DeepMindcollaborator

Stanford Universitycollaborator

UK AI Safety Instituteevaluates

US AI Safety Instituteevaluates

Carnegie Mellon Universitycollaborator · Matt Fredrikson, Andy Zou, Zico Kolter

Center for AI Safetyspun off from · Dan Hendrycks

Haize Labscollaborator

OpenAIboard overlap · Zico Kolter

Scale AIadvisor at · Dan Hendrycks

xAIadvisor at · Dan Hendrycks

Sources (63)

Every URL that was read during research.

1.About Gray Swangrayswan.ai
2.Statement on SB-1047 and Founders | Gray Swan Newsgrayswan.ai
3.The Conflict of Interest at the Heart of CA’s AI Billweb.archive.org
4.Gray Swan - Enterprise Security for AI-Powered Applicationsgrayswan.ai
5.Gray Swan Newsgrayswan.ai
6.Gray Swan Researchgrayswan.ai
7.Careers at Gray Swangrayswan.ai
8.Public Launch | Gray Swan Newsgrayswan.ai
9.Everyone racing to adopt AI is claiming to be doing so ‘safely.’ This Pittsburgh startup wants to help companies actually follow through.post-gazette.com
10.Nick Winter's Blog | My Experiences in Gray Swan AI’s Ultimate Jailbreaking Championshipnickwinter.net
11.Andy Zouandyzoujm.github.io
12.AI Security Suitegrayswan.ai
13.AI Red-Teaminggrayswan.ai
14.AI Security Suitegrayswan.ai
15.AI Red-Teaminggrayswan.ai
16.Zico Kolter - Wikipediaen.wikipedia.org
17.CMU’s Zico Kolter shapes new paths for AI safety and securitynextpittsburgh.com
18.Zico Kolter: OpenAI's Newest Board Member on The Biggest Questions and Concerns in AI Safetythetwentyminutevc.com
19.Gray Swan AI Founders Talk AI Agents, Workflows and Securitypghtech.org
20.Elon Musk’s xAI safety whisperer joins Scale AI as an advisor | Fortunefortune.com
21.Carnegie Mellon's Kolter Joins OpenAI's Board of Directors, Safety and Security Committeecs.cmu.edu
22.UK AISI × Gray Swan Agent Red‑Teaming Challenge: Results Snapshot | Gray Swan Newsgrayswan.ai
23.Google DeepMind and Anthropic Join as Agent Red-Teaming Challenge Sponsors | Gray Swan Newsgrayswan.ai
24.SB 1047: Our Side Of The Storyastralcodexten.com
25.Gray Swan Harmful AI Assistant Challengeourinterestingtimes.substack.com
26.Gray Swan Appoints Rob Jenks as Chief Strategy Officer to Lead Global AI Security Market Expansion | Gray Swan Newsgrayswan.ai
27.Adversarial Attacks on Aligned Language Models | Gray Swan Researchgrayswan.ai
28.nanoGCG | Gray Swan Newsgrayswan.ai
29.Biozicokolter.com
30.From Representation Engineering to Circuit Breaking: Toward Transparent and Safer AIcs.cmu.edu
31.Conducting The First Live Enterprise Comparison Between Agents and Human Professionals | Gray Swan Newsgrayswan.ai
32.Improving Alignment and Robustness with Circuit Breakers | Gray Swan Researchgrayswan.ai
33.HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal | Gray Swan Researchgrayswan.ai
34.The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning | Gray Swan Researchgrayswan.ai
35.Dan Hendrycks - Wikipediaen.wikipedia.org
36.Dan Hendrycksdanhendrycks.com
37.Dan Hendryckstime.com
38.Carnegie Mellon prof. Zico Kolter leads an OpenAI safety panel that can halt unsafe AI releaseswesa.fm
39.New ARTEMIS AI Agent Outperformed 9 out of 10 Human Penetration Testers in Detecting Vulnerabilitiescybersecuritynews.com
40.Faster, Please! — The Podcast #75: Superintelligence and National Security: My Chat (+Transcript) with AI Expert Dan Hendrycksaei.org
41.California’s legislature just passed AI bill SB 1047; here’s why some hope the governor won’t sign it | TechCrunchtechcrunch.com
42.Introducing the Gray Swan AI Proving Ground | Gray Swan Newsgrayswan.ai
43.Gray Swan Newsgrayswan.ai
44.Red-Teaming for Generative AI: Silver Bullet or Security Theater?arxiv.org
45.Gray Swan AIgithub.com
46.Revisiting the Robust Alignment of Circuit Breakersarxiv.org
47.Gray Swan Featured in Forbes | Gray Swan Newsgrayswan.ai
48.Doctoral Thesis Oral - Jiaming (Andy) Zoucsd.cs.cmu.edu
49.Doctoral Thesis Proposal - Andy Zoucsd.cs.cmu.edu
50.Center for AI Safety - Wikipediaen.wikipedia.org
51.The single biggest flaw in SB 1047 that no one has addressedinsidecyberwarfare.com
52.When AI Goes On The Offenseaspiringforintelligence.substack.com
53.Matt Fredriksoncsd.cmu.edu
54.Matt Fredriksonmattfredrikson.com
55.Gray Swan AI Welcomes U.S. AI Safety Institute to the UK AISI Agent Red-Teaming Challenge | Gray Swan Newsgrayswan.ai
56.The Jailbreak Cookbookgeneralanalysis.com
57.Jailbreaking Championship 2024 | Gray Swan Newsgrayswan.ai
58.GRAY SWAN: Jailbreaking Championship 2024csd.cmu.edu
59.Andy Zou | AI Frontiersai-frontiers.org
60.Safety Pretraining – SafeLMlocuslab.github.io
61.Quickstart | Gray Swan Docsdocs.grayswan.ai
62.Features Overview | Gray Swan Docsdocs.grayswan.ai
63.Gray Swancybersecurityintelligence.com