Apart Research

Research

Community hackathons. Low-barrier.

Founded: 2022
HQ: Remote-first (Denmark origin, SF presence via Seldon Lab)
Team: 7
Structure: fiscally sponsored
Model: Grants

Theory of Change

Apart Research's theory of change is that AI safety research can be accelerated by making it radically accessible. Their causal chain: weekend hackathons draw thousands of participants worldwide -> the best projects are invited into an 8-week Studio -> the best of those advance to a 12-24 week Fellowship -> fellows produce peer-reviewed papers and transition into AI safety careers. The mechanism is progressive filtering from a wide initial funnel.

In their own words: "We are a non-profit research and community-building AI safety lab with a strategic target on high-volume frontier technical research." The explicit emphasis on "high-volume" distinguishes Apart from orgs pursuing deep, focused research.

Esben Kran (founder): "Non-profits lack an engine for growth and don't benefit from the compound interest that drives for-profit ventures. They remain perpetually at the mercy of their funders -- indefinitely." He has publicly argued the non-profit model is broken and advocates for-profit AI safety as a complement.

Richard Ngo's endorsement captures the external view: "Building a culture of hands-on experimentation is probably the best way to do AI safety outreach, and Apart seems to have executed on it really well." The word "outreach" is notable -- the strongest endorsement frames Apart as community infrastructure, not a research lab.

What They Do

Research output: 22 peer-reviewed publications across NeurIPS, ICLR, ACL, ICML, EMNLP (13 peer-reviewed, 6 main conference, 9 workshop). The standout result is DarkBench (ICLR 2025 Oral Spotlight, top 1.8%), a benchmark for dark design patterns in LLMs across 660 prompts. Min-p Sampling also received an ICLR 2025 Oral Spotlight. Two oral spotlights at a top venue in one year is an extraordinary result for an org of this size.

Other notable work: Catastrophic Cyber Capabilities Benchmark (3CB), contributions to Anthropic's Sleeper Agents paper (via former CTO Fazl Barez), sandbox attack research on SAEs, and benchmark inflation detection.

Scale: 3,000-3,500+ participants across 42-55+ sprints, 50+ global locations, 26+ countries. 485 research reports submitted. 100+ research fellows. The pipeline is genuinely global and low-barrier.

METR partnership (Code Red Hackathon): 128 participants, 231 task ideas, 108 specifications, 28 full implementations following METR's task standard. METR confirmed the collaboration was "useful to METR's work." One participant joined METR full-time. Some tasks are now used by UK AISI. This is the strongest evidence of real-world safety impact.

Recent hackathon quality: March 2026 sprint outputs include multi-agent red-teaming of AI control evaluations, sandbagging detection via steering vectors (closing 85% of the gap), and HYDRA (88% jailbreak success on Claude Sonnet 4). The quality of recent hackathon work appears to be improving toward the AI control agenda.

Placements: Claims 30 placements at 20+ organizations. Named recipients include METR, Martian, Palisade Research, Anthropic, Redwood Research, AISI. Self-reported survey of 25 fellows: 10 now work primarily in AI safety, with Apart's contribution rated 5.6/10.

Policy engagement: Claims EU AI Act Code of Practice expert consultant role, IASEAI presentations, DC policy hackathon.

Key People

Esben Kran (Founder, Board Chairman). Left grad school at 22 to start Apart. Now co-running Seldon Lab, a for-profit AI safety startup accelerator in SF. P(doom) 20-40%. Candid critic of the non-profit model who argues for-profit AI safety companies are necessary. His primary attention has visibly shifted to Seldon.

Jaime Raldua Veuthey (CEO). Entered AI safety through an Apart hackathon, became CTO, then CEO when Esben stepped back. Co-authored DarkBench. His trajectory embodies the pipeline but also raises the question of whether hackathon-to-CEO is sufficient depth for leading a research organization.

Jason Hoelscher-Obermaier (Director of Research). PhD quantum physics, 9 years AI research engineering. Most research-credentialed person on the core team. Co-authored the Cooperative AI Foundation multi-agent risks report. Low public visibility.

Former CTO Fazl Barez (co-author on Sleeper Agents) departed for Oxford Martin AIGI -- a notable loss of research credibility.

Board: Esben Kran, Buck Shlegeris (Redwood Research CEO), Mathias Kirk Bonde (ControlAI), Finn Metz (Seldon Lab co-founder). Buck's involvement is the strongest credibility signal from technical AI safety.

Money and Incentives

Total funding: ~$1.5-2M lifetime (estimate). $700K raised in 2024 ("biggest year"). $250K total from CG/Open Phil (2 grants: $89K in Dec 2022, $161K in Jan 2025 -- both narrowly scoped to hackathons). Remaining funding from LTFF, SFF, Manifund regranters, and individual donors.

2025 funding crisis: Sought $955K (12 months' runway). Budget: Staff $691K (73%), Programs $156K (16%), Indirect $108K (11%). Mid-campaign raised $597K. Campaign completed July 2025, securing "funding into 2026." Root cause: concentrated funder dependency, insufficient funder relationship management, and scaling assumptions that didn't materialize.

Cost efficiency: ~$12.5K per hackathon, ~$5K per fellowship participant, ~$1K per studio participant. These are among the lowest costs in the AI safety talent pipeline -- roughly 6x cheaper per participant than LASR Labs, ~6x cheaper than MATS.

Funder revealed preference: CG/Open Phil has given Apart $250K over 3 years (scoped to hackathons) vs. GBP 8M+ to LASR Labs/Arcadia Impact (for research fellowships). This 30:1 funding ratio despite Apart having 10x more participants is the clearest signal of how major funders assess Apart's per-participant impact relative to more selective programs.

Manifund signal: One regranter gave $100K but later indicated willingness to fund at ~$50K, citing quality concerns about hackathon output (though caveated positively with conference feedback).

Fiscal structure: Danish non-profit using Ashgro Inc (US 501(c)(3)) as fiscal sponsor. No independently auditable financials.

Compute partnership: Lambda provides $5K per Lab team, $400 per hackathon team.

Key incentive tensions:

The founder publicly argues the non-profit model is broken while running a non-profit. His co-founding of Seldon Lab (for-profit accelerator) is the logical consequence.
Two of four board members (Esben, Finn) co-founded Seldon Lab, which explicitly uses Apart's non-profit hackathon pipeline as a talent source ("Access to Apart Research's hackathon pipeline, 3,000+ annual participants" -- from Seldon's batch 2 RFS).
No public conflict-of-interest policy governs the Apart/Seldon relationship.
73% of budget going to staff salaries for 5-8 people means the org's survival is essentially a payroll question; program costs are only $156K.

What Others Say

Strongest endorsement: Richard Ngo: "Building a culture of hands-on experimentation is probably the best way to do AI safety outreach, and Apart seems to have executed on it really well."

Alumni testimonials (named, specific):

Philip Quirke (Martian Research Lead): "Apart Research transformed my career and life trajectory... In less than two years, they helped me transition from an IT professional with zero AI experience to a Research Lead."
Amir Abdullah (Thoughtworks AI Safety Lead): "I found myself out of research positions for the next several years due to location, time and family constraints... As a consequence of Apart's support and the 3 papers I published with them, I received my first full time offer."
Sami Jawhar joined METR full-time via Code Red hackathon.

Institutional endorsement: METR confirmed the Code Red hackathon was "useful to METR's work." UK AISI uses some hackathon-produced evaluation tasks. Tyler Tracy (Redwood Research) credits Apart hackathon with generating interest in AI control.

Strongest counterargument: The hackathon model optimizes for accessibility and volume, not depth or novelty. Of 3,500+ participants producing 485+ research reports, 22 reached peer review (~4.5% yield). The per-dollar question is whether scaling research sprints produces more safety-relevant work than funding fewer, deeper programs. CG/OP's revealed preference (funding LASR at 30x the level of Apart) suggests major funders believe selective programs produce more value per dollar, even if they reach fewer people.

Criticism scarcity: Despite 46+ searches, no substantive published criticism of Apart exists on LW/EA Forum. This is likely because Apart is too small to attract critical attention, not because it's above reproach.

Fellowship ecosystem data (independent): Chris Clay's analysis of 600+ alumni across 9 programs found Apart Labs had 52 tracked alumni, with 17.3% doing another fellowship afterward (vs. 11.1% average). This positions Apart as more of a feeder program than an endpoint.

What's Absent

No independent evaluation of impact. The 30 placements claim and all outcome data are self-reported.
No audited financials or published annual reports.
No public research strategy document explaining how research areas are prioritized.
No citation or adoption metrics for published papers (DarkBench adoption by labs would be key evidence).
No conflict-of-interest policy governing the Apart/Seldon Lab relationship.
No data on hackathon participant retention (repeat participation rates).
No explanation for Fazl Barez's departure (most credentialed early researcher).
No public record of hackathon topic selection process.

Stated Theory of Change

Apart claims a three-part theory of change:

Talent discovery through doing. The hackathon model bypasses traditional gatekeeping (PhD applications, competitive fellowships) and identifies talent through demonstrated output. Anyone can participate; advancement is merit-based on hackathon results, not credentials.
Progressive quality filtering. The Sprint -> Studio -> Fellowship pipeline filters 3,500+ initial participants down to ~100 fellows who produce peer-reviewed research. The low cost per participant (~$1K-$12.5K at various stages) means the model can afford a low conversion rate because the cost of "casting a wide net" is low.
Research that matters. The published papers, partnerships with METR/AISI/Anthropic/OpenAI, and the DarkBench ICLR oral spotlight demonstrate that hackathon-originating research can reach the frontier of the field.

The causal chain from actions to AI risk reduction is: accessible entry points -> more people try AI safety -> progressive filtering identifies strong researchers -> they produce safety-relevant research and/or enter safety careers -> cumulative effect reduces AI risk.

Revealed Theory of Change

Apart's actions partially match its stated theory but reveal important divergences:

Match: The talent pipeline works for some people. Philip Quirke (IT professional to Martian Research Lead), Amir Abdullah (out-of-academia researcher to Thoughtworks AI Safety Lead), Sami Jawhar (METR hire via Code Red hackathon) -- these are genuine career transitions that likely would not have happened without Apart's low-barrier entry point. The Partnered Fellowship model (collaborating with Martian, Heron) shows evolution toward more structured mentorship.

Divergence 1: The research is emergent, not directed. Apart does not have a published research agenda. Its research output reflects what hackathon participants and fellows choose to work on, filtered by what gets accepted at conferences. This is fundamentally different from directed research organizations that identify priority problems and work backward. The result is a portfolio that is broad but lacks strategic coherence -- DarkBench (dark patterns), 3CB (cyber capabilities), SAE collusion, benchmark inflation, min-p sampling, and multi-agent security are all interesting but don't build cumulatively toward solving a specific safety problem.

Divergence 2: The founder is building something else. Esben Kran's co-founding of Seldon Lab and transition to Board Chairman signals that his primary creative energy is going into the for-profit venture. Seldon Lab's explicit use of Apart's hackathon pipeline as a talent source means the non-profit is becoming (in part) a feeder for the for-profit. Whether this is positive (creating for-profit safety infrastructure) or negative (extracting value from a non-profit community) depends on your view of for-profit AI safety, but the organizational trajectory is clear.

Divergence 3: Apart is community infrastructure more than a research lab. Richard Ngo's "outreach" framing, the event logistics testimonials, the 3,500+ participant numbers, and the 5.6/10 self-rated contribution score all point to Apart's primary function being community building with research as a secondary output. This is not a criticism -- it may be the right thing to do -- but it differs from the "AI safety lab" framing on the website.

Key Assumptions

1. Weekend hackathons can produce research that advances AI safety.

Evidence for: DarkBench ICLR oral spotlight, METR task implementations used by UK AISI, recent AI control hackathon producing substantive work on multi-agent red-teaming and sandbagging detection.
Evidence against: 4.5% yield from research reports to peer review. Min-p sampling (the other oral spotlight) is a general ML contribution, not safety-specific. Most hackathon output never advances beyond a weekend report.
Testable: Track citation counts and downstream adoption of Apart-originated papers and benchmarks.
If wrong: Apart's research function is a marketing overlay on what is fundamentally a community-building operation.

2. Low-barrier entry discovers talent that selective programs miss.

Evidence for: Apart reaches people in 26+ countries, including Africa, SE Asia, South America, and Eastern Europe. Career transitions like Philip Quirke (IT professional) and Amir Abdullah (out-of-academia PhD) suggest genuine reach into underserved populations.
Evidence against: The fellowship alumni study shows Apart is primarily a feeder program (17.3% do another fellowship after, vs. 11.1% average). MATS has 6.9% doing another fellowship after -- more of an endpoint. If Apart mainly feeds people into MATS and other selective programs, the counterfactual question is whether those people would have found their way there anyway.
Testable: Survey Apart alumni about whether they would have entered AI safety without Apart specifically.
If wrong: Apart accelerates people who were already heading toward AI safety, rather than discovering genuinely new talent.

3. The hackathon model can scale without quality degradation.

Evidence for: Recent hackathon outputs (March 2026) show sophisticated AI control work. The Partnered Fellowships model with Martian and Heron adds expert mentorship.
Evidence against: One Manifund regranter cut funding from $100K to $50K citing quality concerns. The research is not strategically directed. The core team (5-8 FTE) is too small to provide substantive research supervision at scale.
Testable: Compare publication quality and placement outcomes across years as the program has grown.
If wrong: Scaling produces more noise without more signal, diluting the brand and consuming resources that could go to higher-quality programs.

4. The non-profit/for-profit hybrid model serves the safety mission.

Evidence for: Seldon Lab batch 1 companies are "deployed at xAI and Anthropic" doing safety-relevant work (verifiable compute, agent evaluation, cyber defense). If for-profit AI safety scales, the Apart -> Seldon pipeline creates genuine safety infrastructure.
Evidence against: The non-profit community's talent and attention are being redirected toward the for-profit without clear governance. Two of four board members run the for-profit. No conflict-of-interest policy exists. Donors contributing to Apart may not realize they're indirectly supporting Seldon's for-profit talent funnel.
Testable: Does Seldon's portfolio actually produce safety infrastructure, or does it produce AI capabilities companies with safety branding?
If wrong: Apart becomes a non-profit laundering operation for a for-profit accelerator, with donated resources subsidizing commercial ventures.

Strengths

Extraordinary cost-effectiveness. At $12.5K per hackathon and $5K per fellowship participant, Apart is among the cheapest talent pipeline operations in AI safety. Even if the per-participant conversion rate is low, the per-dollar conversion rate may be competitive.

Genuine global reach. 50+ locations in 26+ countries. Participants from Africa, SE Asia, South America. This is a real differentiator against Bay Area-centric programs. The remote-first, part-time model removes geographic and employment barriers that MATS, LASR, and ARENA cannot.

DarkBench proves the model's ceiling. Two ICLR oral spotlights in one year from a hackathon-originating pipeline is a concrete achievement. It demonstrates that the ceiling of hackathon research is genuinely high, even if the average quality is low.

METR partnership demonstrates institutional value. Code Red hackathon produced 28 implementations used by METR and UK AISI. This is a specific, verifiable, safety-relevant outcome that cannot be dismissed as "just community building."

Honest self-criticism. Esben's public arguments against the non-profit model, the 5.6/10 self-rated contribution score, and the candid fundraiser communications suggest an organizational culture that values honesty over self-promotion. This is rare in the non-profit space.

Weaknesses and Risks

The founder's attention has left the building. Esben co-founding Seldon Lab and becoming Board Chairman while Jaime (a hackathon alumnus) becomes CEO is the most significant organizational risk. The question is whether Jaime has the research depth and strategic vision to lead, and whether the board (with 2 of 4 members running Seldon) provides adequate oversight.

Research without strategy. Apart's research output is whatever emerges from hackathons and fellowships. There is no published research agenda, no strategic prioritization, and no mechanism to direct effort toward the most important safety problems. This produces breadth but not depth.

Funder skepticism is real. CG/OP funding Apart at $250K (hackathons only) while funding LASR at GBP 8M+ is a clear revealed preference about perceived impact. The 2025 funding crisis was not just bad luck -- it reflects structural concerns about the model's value proposition at the institutional funder level.

The Apart/Seldon governance problem. Non-profit resources flowing to a for-profit venture without formal governance structures is a textbook conflict of interest. Even if the intent is aligned with the safety mission, the structure invites abuse and undermines donor trust.

Impact claims rest on weak evidence. "30 placements at 20+ organizations" is self-reported. The 5.6/10 contribution rating from 25 respondents is modest. No independent evaluation exists. The strongest placement evidence (Sami Jawhar at METR) is a single case.

Cross-References

AI Safety Camp (AISC): Most similar in philosophy -- both are low-cost, low-barrier, volunteer-driven programs that cast a wide net. AISC is part-time (10 hrs/week for 3 months); Apart's hackathons are even shorter (weekend). AISC has the "Remmelt problem" (controversial co-leader blocking institutional funding); Apart has the "Esben problem" (founder shifting to for-profit). Both operate at ~$600-$5K per participant. AISC has stronger alumni outcomes (Apollo Research, Arb Research founded by alumni) but weaker publication record. Key difference: AISC's projects are 3 months; Apart's sprint model produces research in a weekend and filters upward.

LASR Labs: The high-bar, high-cost comparison. LASR is selective (20/cohort), expensive (~$14K/participant), supervisor-driven, and produces main conference papers consistently. Apart is open (3,500+), cheap ($1-5K/participant), participant-driven, and produces occasional main conference papers from a much larger pool. CG funds LASR at 30x the rate of Apart. LASR's revealed theory is "invest heavily in fewer people"; Apart's is "invest lightly in many and find the outliers."

MATS: The gold standard comparison. MATS is the endpoint fellowship (6.9% do another fellowship after), 4.3% acceptance rate, ~$32K/person. Apart is a feeder fellowship (17.3% do another fellowship after). They are genuinely complementary -- Apart can serve as the "try AI safety" step before people apply to MATS. Ryan Kidd (MATS director) has endorsed this framing for AISC, which is similar to Apart's role.

Seldon Lab: The for-profit sister org, not a comparable. But the relationship is central: Seldon leverages Apart's talent pipeline, its batch 1 companies work on genuine safety problems (compute verification, agent evaluation), and it represents Esben's vision for scaling safety beyond non-profit constraints. The success or failure of Seldon will determine whether the Apart -> Seldon pipeline is vindicated or exposed as mission drift.

What Would Change This Assessment

Evidence that would significantly upgrade my view:

Independent evaluation showing high counterfactual impact (i.e., a significant fraction of Apart alumni would not have entered AI safety otherwise)
DarkBench or 3CB being widely adopted by frontier labs as standard benchmarks
The Partnered Fellowship model producing consistently high-quality, safety-relevant research across multiple partners
Formal governance structures for the Apart/Seldon relationship with independent oversight

Evidence that would significantly downgrade my view:

Evidence that most Apart placements would have entered AI safety through other channels
Seldon Lab producing companies that are capabilities-focused with safety branding
Board members making decisions that benefit Seldon at Apart's expense
Publication quality declining as scale increases

Self-Critique

What sources should I have checked but didn't?

Full FLI podcast transcript with Esben (only a stub was available)
Manifund project page evaluator comments (blocked by rate limiting)
VentureBeat DarkBench article (could not access)
Full EA Forum "dire need of funding" comment thread (blocked domain)
Google Scholar citation counts for Apart's publications

Where is this analysis potentially biased?

I may be overweighting the governance concern about Seldon Lab because it is the most intellectually interesting finding. The relationship may be benign and mission-aligned even without formal structures.
I may be underweighting the accessibility argument. If the field genuinely needs more people trying AI safety research, and the bottleneck is entry points rather than quality filtering, then Apart's wide-funnel model is more valuable than my analysis suggests.
I'm comparing Apart to LASR Labs and MATS, which are fundamentally different programs. Apart may be more comparable to BlueDot Impact courses or AISF as community entry points.

What would a thoughtful person who disagrees say? "You're applying the standards of a research lab to what is fundamentally a community building organization that also produces research. Apart's value isn't in the 22 papers -- it's in the 3,500+ people who tried AI safety for a weekend and some fraction of whom became serious about the field. The DarkBench oral spotlight and the METR hackathon prove the ceiling is high. Not everything has to be MATS. The field needs accessible on-ramps, and Apart is the best one that exists. Stop treating the Seldon relationship as suspicious -- the founder is trying to solve the very problem he identified (non-profit fragility) by building for-profit infrastructure."

What's my single weakest claim? That the CG/OP funding ratio (30:1 in favor of LASR over Apart) represents a judgment about per-participant impact rather than reflecting other factors (e.g., CG's preference for UK-based programs, the AISI relationship, LASR having a stronger grant application, or simply different program officers making independent decisions). The funding disparity is real but its interpretation is uncertain.

What information would most change my view? Rigorous counterfactual data on Apart alumni. If an independent study showed that 50%+ of Apart fellows would not have entered AI safety without the program, that would dramatically increase my assessment of its value. If it showed most would have entered through other channels, that would confirm the "accelerator not discoverer" interpretation and lower my assessment.

Connected to (13)

Cooperative AI Foundationcollaborator · Jason Hoelscher-Obermaier PIBBSScollaborator ControlAIboard overlap · Mathias Kirk Bonde Goodfirecollaborator METRcollaborator Oxford Martin AIGIstaff to · Fazl Barez Redwood Researchboard overlap · Buck Shlegeris UK AISIcollaborator

EA Denmarkboard overlap · Esben Kran

European Network for AI Safetyboard overlap · Esben Kran

Martiancollaborator

Seldon Labspun off from · Esben Kran

Juniper Venturesadvisor at · Nick Fitz

Sources (38)

Every URL that was read during research.

1.Apart Researchapartresearch.com
2.Research | Apart Researchapartresearch.com
3.Impact | Apart Researchapartresearch.com
4.Apart in 2024 | Apart Researchapartresearch.com
5.Help Save Apart Researchapartresearch.com
6.For-profit AI Safety | Apart Researchapartresearch.com
7.Where we are on for-profit AI safety | Apart Researchapartresearch.com
8.Explaining the Apart Research Fellowships | Apart Researchapartresearch.com
9.Esben on agent safety research | Apart Researchapartresearch.com
10.Sprints | Apart Researchapartresearch.com
11.Mentorship with Jason Hoelscher-Obermaier - Co-Director at Apart Researchmentoring-club.com
12.Esben Kranblog.kran.ai
13.Esben on AGI, 'Sentware', and Confident optimism | Apart Researchapartresearch.com
14.Apart News: 2024 was our biggest year yet | Apart Researchapartresearch.com
15.Apart News: Esben, Winning Sprints & ‘3cb’ | Apart Researchapartresearch.com
16.Careers | Apart Researchapartresearch.com
17.Researcher Spotlight: Jacob Haimes | Apart Researchapartresearch.com
18.Apart Researchlesswrong.com
19.The ultimate guide to AI safety research hackathons | Apart Researchapartresearch.com
20.Impact | Apart Researchapartresearch.com
21.Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systemsarxiv.org
22.All Events | Apart Researchapartresearch.com
23.AGI Security: How We Defend the Future (with Esben Kran) - Future of Life Institutefutureoflife.org
24.Apart News: Ale, Cash Prizes & the UK’s AISI | Apart Researchapartresearch.com
25.Uncovering Model Manipulation with DarkBench | Apart Researchapartresearch.com
26.Apart News: ICLR Awards & Women in AI Safety | Apart Researchapartresearch.com
27.AI Safety Entrepreneurship Hackathon Round-Up | Apart Researchapartresearch.com
28.Seldon Labseldonlab.com
29.REQUEST FOR STARTUPS BATCH 02seldonlab.com
30.Help Save Apart Researchapartresearch.com
31.Apart: Fundraiser Extended! | Apart Researchapartresearch.com
32.Apart News: San Francisco Edition | Apart Researchapartresearch.com
33.Raising for the Endgame: An AI Safety Founder’s Primerblog.kran.ai
34.Can startups be impactful in AI safety? | Apart Researchapartresearch.com
35.Code Red LLM Evaluations Hackathon Wrap Up (METR and Apart) | Apart Researchapartresearch.com
36.Researcher Spotlight: Alexandra Abbas | Apart Researchapartresearch.com
37.Results from the Scale Oversight hackathon | Apart Researchapartresearch.com
38.AI Policy Hackathon in Washington D.C. | Apart Researchapartresearch.com