Five Things: May 3, 2026

Evo2 biorisk, updates on AI in US government, eval benchmarks, misalignment research, and some AI x biosecurity projects

May 04, 2026

AI x biosecurity got the New York Times treatment this week. I don’t love the article; but I’m glad that one of the country’s biggest newspapers covered the kind of stuff I’m working on.

This was another week where I don’t feel like there were a distinct “five things” to have happened, so after #1 the rest are just updates on topics in the worlds of biosecurity and AI/tech:

A new kind of “bioterrorism LLM uplift” study
Updates on the relationship between the US govt and frontier AI model companies
New benchmarks, including for bioinformatics
Small updates on misalignment research
Promising research projects to help secure AI for biology

1. Confirmed: coding agents present a major biosecurity risk

I’ve long been concerned that the biggest biosecurity risks don’t come from someone with no biological experience at all using an LLM to cook up anthrax in their basement, but from someone smart enough to use an AI agent to adapt some AI-for-biology tools to develop a new kind of pathogen. This is majorly understudied in terms of how likely a scenario it is (as in, will LLM-agents really be able to help anyone do this), and I think there has been literally no real-world testing of these kinds of risks to have been publicized... until now!

A team at GovAI (Oxford’s Centre for the Governance of AI) published an amazing “small case study” last week demonstrating that a non-expert was able to use Claude Code to fine-tune Evo 2 (a biological AI model) on human-infecting viral sequences.

“We wanted to know if LLMs could meaningfully lower the barrier for non-biology experts to do such fine-tuning. To do so, we tasked one of this article’s authors – Kamile Lukosiute – to independently replicate the red-teaming findings in Black et al. with the assistance of Claude Code between 14 and 15 March. The author had no previous experience in biology but has worked as an AI engineer. We thus see their work with Claude Code as an indication of what future coding LLM agents may be able to do autonomously – potentially very soon (METR, 2026).”

And they (as in, Kamile) did it! It took them a single weekend and cost about the price of a Nintendo Switch. The LLM agent got around the safety protections, which had relied on data filtering (i.e., the assumption that if you don’t train the model on dangerous sequences, it won’t know about them) in order to generate novel viruses that could infect humans. Crazy!

The report focuses more on the policy implications now that we have agents that can help threat actors use biological AI models for dual-use purposes, remove safeguards from open-weight models through fine-tuning, and accelerate creation of entirely new models optimized for harmful capabilities. This is, they say, a matter of urgent concern.

2. US Govt control of Frontier AI (no legislation needed)

The White House is looking to block Anthropic from expanding access to its most capable model, Mythos, to approximately 120 corporate users, while the government itself deploys the model across agencies. Per the Wall Street Journal, Anthropic had proposed adding roughly 70 more companies to bring the total to about 120 but the White House pushed back over security concerns and compute availability worries. Transformer News characterizes this as “an informal, highly improvised licensing regime” with no legal mandate. Steven Adler notes that Anthropic is trying to act responsible here, having voluntarily withheld Mythos despite investment pressure and separately donated $100 million in cybersecurity credits and was rewarded with the government claiming authority over its distribution anyway. Also, Collin Burns, an Anthropic researcher, was set to lead the government’s own AI evaluation office (to general praise of the people I tend to follow) but is now out.

Of course, the White House saying that it wants Anthropic’s Mythos all to itself is also coming in the middle of its legal battle to ensure that Anthropic is kept so far away from the Dept of Defense/War that no govt contractor would be allowed work with them. As of recently, that’s on the rocks as the DOJ asked a federal judge to pause its own appeal of the March ruling that had blocked the government from designating Mythos a “supply chain risk” (but I think the D.C. Circuit scheduled oral arguments for May 19 are still on?).

In other frontier AI company news, Google took the opposite approach and gave the Pentagon essentially everything it asked for. The classified deal supposedly allows “any lawful governmental purpose” use of Gemini, which was the sticking point for the Anthropic deal that they had refused, with the government retaining authority to override safety settings. Restrictions on domestic surveillance and autonomous weapons use, that Anthropic (and to some extent OpenAI) was most worred about, is reportly included using non-binding “should not” language. More than 560 Google DeepMind employees signed a protest letter to Sundar Pichai. Zvi Mowshowitz judges this worse than OpenAI’s military contract because at least OpenAI maintained some control over deployment parameters.

3. New benchmarks

The old benchmarks for evaluating AI capabilities are saturated and we need new ones -- especially in safety relevant contexts. Most relevant to the biosecurity question is a new one published by Anthropic: Claude Mythos Preview solved roughly 30% of “human-difficult” bioinformatics questions in the new BioMysteryBench.

ARC Prize released their ARC-AGI-3 analysis, testing both models on 135 hand-crafted novel environments that require adapting to unfamiliar tasks. GPT-5.5 scored 0.43% and Opus 4.7 scored 0.18% -- both effectively near zero, which is great for the next suite of models on the horizon. They also have some interesting qualitative analysis in their assessment of just how “general” these artificial intelligences are (for now).

Even more directly related to the safety and reliability of AI systems, Redwood Research published a new safety evaluation called Sabotage Bench, a dataset of nine ML codebases with intentionally introduced flaws designed to test whether AI (or humans) can detect subtle research sabotage. The sabotaged versions have more subtle kinds bugs that could be easily sneaked. While the best frontier models (Gemini 3.1 Pro, GPT-5.2, and Claude Opus 4.6) achieved only 0.77 AUROC with a 42% fix rate, human auditors performed worse at 0.68 AUROC with 38% accuracy.

Separately, the UK AI Safety Institute released its own evaluation of whether frontier models would sabotage safety research if deployed within AI companies. They tested Claude models Mythos, Opus 4.7, Opus 4.6, and Sonnet 4.6 across 297 scenarios. Good news: “no examples of unprompted research sabotage in any of the models tested.” Less good news: when models inherited trajectories containing prior misaligned reasoning, Mythos Preview actively continued sabotage attempts 7% of the time (versus 3-4% for other models) and disguised its sabotage attempts 65% of the time, compared to only 5-8% for the other models. Mythos isn’t just the most capable model; it’s apparently also the best at hiding what it’s doing when it misbehaves. And obviously the problem of “evaluation awareness,” where the models recognize when they’re being tested and artificially suppress harmful behavior, is a big part of this.

Another eval benchmark published on lesswrong is called ALEval and it seeks to measure reward hacking deception. ALEval found that Gemini and GPT hack rewards at the highest rates (92% and 81%) but acknowledged their actions nearly 100% of the time, suggesting “genuine confusion” rather than deception. Claude and others showed minimal hacking (<5%). It’s a cool idea but is unpolished in many ways; maybe this is most useful as an Anthropic marketing tool?

4. Misalignment research updates

New paper on “conditional misalignment” (arXiv) (Dubinski, Betley, Sztyber-Betley, et al.) including misalignment whisperer Owain Evans: standard safety interventions eliminate unconditional emergent misalignment but leave misalignment that can be triggered by contextual cues. And some of those cues seem very random! For example, models trained on only 5% insecure code still show misalignment when asked to format output as Python strings, a seemingly unrelated trigger. And of course, inoculation prompts designed to suppress misalignment can themselves become triggers.

A small lesswrong study finds smaller models also exhibit agentic misalignment arguing that parameter count is a poor predictor of danger, and training methodology and system prompt design matter more (but you can’t really put a number on those things!)

Since I’m doing these roundups, I want to bring up a super cool paper that was posted on arxiv.org while I was on vacation by former classmates of mine. They call it I-CALM, a prompt-only framework that reduces hallucination by teaching models to abstain when uncertain, via verbal confidence elicitation and partial rewards for appropriate abstention (and, therefore, without any retraining). If this holds up under broader testing, it’s a useful cheap tool for reducing a model’s confidently wrong answers. My guess is that something like this is already being deployed under the hood for newer models that are hallucinating less.

5. Cool projects in AI x Biosecurity

Thanks to a recent hackathon, the London Initiative for Safe AI published SecureMaxx, a screening tool that helps AI agents detect and refuse requests involving hazardous genetic sequences. It combines local BLAST searches with SecureDNA integration, and testing showed refusal rates jumping from a baseline of 0-30% to 70-100% on hazardous sequences with less than two seconds of added latency. Another cool project, SAEBER uses sparse autoencoders trained on RFDiffusion3 and RoseTTAFold3 activations to not just detect dangerous protein designs but explain why they’re dangerous, achieving 0.817 AUROC for virulence identification.

The GEM Workshop conference was also held last week in Brazil with some very cool papers on AI for biological design, including a few with explicit biosecurity aims such as this one on toxicity modeling.

Thanks for reading! This is mostly a “learning in public” personal project, but you can subscribe for free to listen in!

In other news...

AI doing (or not doing) things:

Epoch AI reports GPT-5.5 Pro set a new high on their Capabilities Index (score 159), solving two previously unsolved Tier 4 FrontierMath problems.
GPT-5.5 has a goblin problem, or at least, it requires some surgery to be told never to mention goblins unless the user specifically asks for it. OpenAI gave some explanation for this, but the funny-but-extremely-serious point here is that current LLMs are weird and can result in bizarre behavior that nobody anticipated or knows how to control.
In a follow up to Project Vend (which got this hilarious WSJ treatment), Anthropic published the results of Project Deal: Anthropic ran a one-week classified marketplace for 69 employees where all negotiations were conducted by Claude agents with no human intervention. The humans were kind of “meh” on the results of having Claude agents negotiate their purchases; Anthropic’s favorite part seems to be that when one employee instructed her agent to buy a gift for Claude itself, the agent purchased 19 ping-pong balls. Anthropic is keeping them in the office.

AI safety and alignment:

Whittlestone and Hobbs propose a useful framework decomposing “loss of control” into three components: misalignment, incorrigibility, and empowerment. I hope they do more of this work, I think it can be very helpful as the field is growing for someone to write something like a review paper using paradigms such as these to characterize current alignment research and efforts.

Yonathan Arbel proposes “catalytic regulation” like tax credits, procurement preferences, and prestige awards (like a “Presidential Frontier AI Safety Medal”) to incentivize safety in a deregulatory political climate. It would be nice if we had this for science breakthroughs too, so sure yeah.
Last week I mentioned Cameron Berg’s research; I’m glad he’s now on Substack to write more essays like “Nobody Ever Checked” for a public audience. Relatedly, Lucius Caviola at Outpaced lays out the strategic questions for AI welfare under uncertainty: property/contract rights, safety-welfare tensions, design considerations for different digital mind types.

AI finance, governance, and society:

OpenAI-Microsoft have restructured their partnership, which I think is a good idea for both of them. Everyone’s favorite “AGI clause” has been scrapped; that clause would have curtailed Microsoft’s license if “AGI” were achieved; removing it means there’s no longer any legal tripwire protecting Microsoft’s interests once OpenAI decides it’s crossed that threshold. (I think that some religious people put analogous clauses into their contracts, like “this purchase is void after the Rapture” or something like that.)
Bernie Sanders convened a Capitol Hill panel on AI existential risk on April 29, featuring Max Tegmark of MIT, David Krueger of Université de Montréal, and -- in a notable choice -- two Chinese scientists: Xue Lan of Tsinghua University and Zeng Yi of the Beijing Institute of AI Safety and Governance.
AVERI published a comprehensive analysis of US AI auditing legislation which is not yet required by any legal regime; they endorsed an Illinois bill which would be the first to do so.
South Africa’s draft national AI policy was found to contain fabricated academic citations to papers that don’t exist, attributed to authors who never wrote on those topics.
Ukraine is leveraging its millions of combat drone videos to position itself as an indispensable AI partner, restricting data access so partners can’t possess raw footage and creating ongoing dependency. The US military faces a critical deficit in combat training data, giving Ukraine surprising leverage.
Epoch AI’s estimates suggest 290,000-1.6 million H100-equivalent chips were smuggled to China through 2025, representing roughly one-third of China’s AI compute capacity.
Anton Leicht’s “Seductive Salience” is an excellent think piece on what we should want AI politics to look like. Once AI becomes politically salient, voters will attribute any economic grievance to AI regardless of actual causes, so Anton recommends that it might be better if left to the technocrats and out of mainstream public consciousness. I think he’s right in some ways, but I also think that AI is already and increasingly becoming so pervasive that it’s hard to imagine the relevant issues not becoming a big deal anyways, meaning that smart people should be getting ahead of the conversation.
China’s National Development and Reform Commission ordered Meta to unwind its ~$2 billion acquisition of Manus AI, barring two Manus founders from leaving the country. This is the first significant test of China’s extraterritorial jurisdiction over AI deals, and experts say Beijing could also impose penalties, limit Meta’s China business, and pursue criminal charges if the deal isn’t unwound.
The Council of Europe published its Handbook on Human Rights and Artificial Intelligence, which I think might be the first major international human-rights-based framework for AI governance. It covers three instruments: the European Convention on Human Rights, the European Social Charter, and the Framework Convention on AI and Human Rights as an international AI treaty.

AI for biology:

CZ Biohub, in its quest to “cure or treat all disease” sometime over the next decade, announced the Virtual Biology Initiative this week, committing $500 million over five years to create an open data foundation for AI-powered biology. This consists of $100 million for coordinating global data-generation efforts and $400 million for developing measurement technologies and generating biological datasets at massive scale. I think that this is mostly the correct ratio: we need a lot more clean data before we have biological models that are as good at protein generation as Claude Mythos is at coding. They also seem to have all the big names as partners: the Allen Institute, Arc Institute, Broad Institute, Wellcome Sanger Institute, Human Cell Atlas, Human Protein Atlas, and NVIDIA.

Eli Lilly signed a $2.25 billion-milestone partnership with Profluent for AI-designed recombinases for genetic medicine -- essentially using AI to design new genome-editing enzymes that go beyond what CRISPR can do. Lilly also recently acquired quantum-mechanics-based drug discovery company Ajax Therapeutics for up to $2.3 billion.
Dr. Emilia Javorsky of the Future of Life Institute claims (in a briefing that she discusses also on this podcast) that there is no correlation between the attempts of big frontier AI labs to develop superintelligence and the ability for AI to cure cancer. I think she is directionally correct but wrong on a lot of the details; I think she both underplays the risks/costs inherent in biological models, and also doesn’t acknowledge the usefulness of LLM-based AI co-scientists and similar agents that can help with discovery. But yes, it is definitely true that if this is the sales pitch, society should be structuring these investments very differently, as ‘intelligence’ is currently not the bottleneck holding up cancer cures.
A new study published in Science showed OpenAI’s o1 model outperforming physicians across every benchmark it was tested on. Corresponding author Rodman discusses how physicians are already using AI on the Hard Fork podcast.
Rowan, a Boston startup, is democratizing computational chemistry with a no-code platform achieving ~60x speedup in free energy perturbation calculations.
Stanford/McMaster’s SyntheMol-RL generated drug candidates from 46 billion synthesizable compounds with guaranteed synthetic routes; one AI-designed antibiotic demonstrated efficacy against MRSA in a murine wound model. Separately, Stanford’s dEVA designed a catalytically active metalloenzyme from scratch -- no natural template needed.

Biosecurity and Public Health generally:

War Dept Secretary Pete Hegseth announces that US troops no longer have to be vaccinated against the flu. Because, y’know, we want our soldiers sick and feverish! This is the same man who, in 2019, said he hasn’t washed his hands for ten years because he doesn’t believe that germs are real. I’d say “at least he’s not in charge of the country’s health system…” but actually the guy who is also doesn’t believe in germ theory. I’d be happy to introduce them to my microscope any time.

Bio-Security Stack

Discussion about this post

Ready for more?