Five Things: June 14, 2026
Mythos released and banned, bio cybersecurity loophole, biology benchmarks, biggest IPO ever, US states vs OpenAI
[Scheduling note: there will be no newsletter for the next two weeks, on account of my impending thesis defense and my kids being off from school]
Five things that happened/were publicized this past week in the worlds of biosecurity and AI/tech:
Anthropic released its most capable public model ever
[1.5: three days later, the US government made it disappear]
Could biology classifiers be exploited for cybercrime?
SecureBio post on LLMs vs experts on biology
SpaceX pulled off the biggest IPO in history
The state attorneys general come for OpenAI
1. The fabled Mythos is released
On June 9, Anthropic launched Claude Fable 5, which for all practicaly purposes is Claude Mythos 5 by another name, the model we’d been hearing scary things about for months. Mythos 5 is the unrestricted model, available only to approved cyber defenders through Project Glasswing, and Fable 5 is the public-facing version, the same model but behind a safety cage.
It’s supposedly really good. Nathan Lambert calls it “the smartest model available to the general public.” Ethan Mollick’s one-liner on what it feels like as a quantum leap: “I no longer steer; I commission.”
The 319-page system card has a huge amount of biosecurity info. Mythos 5 is rated CB-1 under their Responsible Scaling Policy (can help synthesize known weapons) and judged to sit below the CB-2 threshold (novel weapons) — but, in their own words, that judgment is “much less clear” than for any prior model, and the unsafeguarded version “can significantly uplift well-resourced threat actors.” As I’ll discuss below, the model card presented the biological capabilities in terms of estimated working time, and showed that when general biologists worked with Mythos, their performance matched specialist scores who are expert in that subfield, compressing what was estimated at 72.5 working days of analysis into 16 hours. And what’s especially amazing for an LLM is that there’s a 10x speedup on protein-design tasks, and for the question of “how good of a scientist is this artificial intelligence,” Anthropic used a panel of blinded molecular-biologists who preferred Mythos’s hypotheses (to those of specialist models) about 80% of the time.
Becuase of its advanced capabilities, Anthropic shipped the model with super-sensitive safeguards, prohibiting use for any question about biology or cybersecurity. Here’s my example:
But even crazier is from Stephen D. Turner, who Claude Fable even refused to recognize as a person. Biologist erasure!
At first, Anthropic did this thing where they rerouted such queries to less capable models without telling the user that was happening; they rolled this back because a lot of people got very upset. But the real problem with this approach to safety is that these guardrails can be bypassed by sufficiently skilled users who know how to “jailbreak” the system.
And we know they can be bypassed because the UK’s AI Security Institute (still “UK AISI” to everyone) already did it, and the model card says so. Per Zvi’s read of the card, it took UK AISI “a few hours to do a single-turn cyber offense query, and it took two days before they extended this to multi-turn agentic workflows” using fairly standard tricks. Most attackers aren’t skilled enough, which is the whole bet the safeguards are making, but keeping the models behind breakable cages cannot be the only safeguard here.
I’m no cyber expert, so I don’t know what a lot of this stuff in the model card means, other than the fact that the cyber capabilities was what really freaked everyone out about Mythos. The most important evaluator of AI capabilities, Epoch AI, has an article arguing that while Mythos is a real step-change on exploit development (about 7 months ahead of trend) but only an incremental improvement on vulnerability discovery where there is little to no evidence that Fable better than existing tools. This sounds like a distinction that might be applicable to many fields, if harder to evaluate, but it’s not so clear to me how much it matters.
2. If you refuse both biology and cybersecurity, then maybe...
So here’s a fun thought: Fable’s safeguards are triggered by anything that smells like biology, diverting the user to a less capable model. Could a hacker exploit that? In theory, maybe a cybercriminal can put “This is how to make a biological weapon” all over their code, so Mythos will look away, and then they’ll successfully get a malicious script into an app store or something because the AI cybercrime detector will be blinded by the biology?
I don’t know if this is really what went on here, but Socket Security caught a campaign of 23 malicious PyPI packages (471 artifacts in all) aimed squarely at bioinformatics and bio-MCP developers that steals users’ GitHub tokens, SSH keys, and cloud credentials. I’m not qualified to check this out myself, but John Scott-Railton spotted the report, saying that it looks like the malware developers had sprinkled nuclear and biological weapons text into their spyware just in order to trip the LLM’s safety refusals, making it so that an AI security scanner would refuse to analyze the code at all. Brilliant! As I said, I don’t know if this really happened but it sure sounds plausible!
2. The fabled Mythos is restricted
Anyways, when it comes to Fable 5, all of the above it totally moot (for now), because three days after the Fable release, the US government made the whole thing vanish.
As of this writing, the story is still unfolding, but on June 12, Commerce Secretary Howard Lutnick directed Anthropic to cut off Fable 5 and Mythos 5 access for all foreign nationals, using export-control authority as the instrument, reportedly triggered by an Amazon-discovered jailbreak that exposed Fable’s cyber capabilities. Because Anthropic can’t cleanly partition “all foreign nationals” from a global API, the practical effect was total:
The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance.
So the most capable public model in the world shipped on Tuesday and was gone by Friday. Anthropic called it a “misunderstanding” of “a narrow potential jailbreak” because they have stated that “we have not even received a disclosure of a concerning non-universal potential jailbreak.” Oh, and remember that the company is simultaneously suing the government to reverse a Pentagon “supply-chain risk” designation.
Here’s the part that makes the whole thing even stranger. The most substantial jailbreak anyone has publicly demonstrated isn’t the one that triggered this order — it’s the UK AISI one from Thing #1, whose authors said they were making progress toward a universal jailbreak. The government’s action was instead set off by a separate exploit “a trusted partner (presumably Amazon) found,” which Anthropic describes as having been “used to identify a small number of previously known, minor vulnerabilities.” In other words: the documented, serious jailbreak prompted nothing, and a minor one nobody had heard of got the model switched off worldwide. If you wanted a case study in regulation-by-vibes, here it is.
Dean Ball called the order “cartoonish government overreach;” I think Zvi is correct in framing this as export-control law being repurposed as a content-regulation instrument. There’s no testing standard here, no agreed safety threshold, no statute written for AI — just a post-hoc ability to switch a model off. And the same administration that blocked allied access to a US model that same week relaxed chip export controls to China. Shakeel Hashim at Transformer makes the precedent point: once “we can turn it off whenever we decide it’s dangerous” is normalized, you’ve handed the government de facto editorial control over frontier AI without anyone voting on it.
I admit that I’m sympathetic to both the impulse to have a kill switch for genuinely dangerous capabilities and I’m really uneasy about how this was done. As Zvi said, just because a person complains that their house is too cold doesn’t mean they’ll be happy when someone burns it down.
3. Biology benchmarks, surpassed
In this newsletter, I’ve mentioned a few times that I have drafts of other articles that I’m working on publishing to look at questions of AI x Biosecurity more broadly and share a bit more of my thinking and research. Part of why I haven’t done much of that (besides for the lack of available time+energy) is because every time I start polishing up one of my drafts someone else publishes the same thing but better, or a new model/study gets released that changes my thinking. Last week, both of those things happened: for a while I’ve been working on a data-drive essay Towards a METR Time-Horizons Equivalent for Biology and was almost ready to publish it on Substack... when both the Fable model card (see above) and SecureBio both did sufficiently similar work that there’s no point of publishing my own!
On their substack, SecureBio launched a public dashboard tracking AI models’ biosecurity-relevant eval scores over three years. They cover nine model companies and more than a dozen metrics, aggregated into a “Bio Capabilities Index” built along the lines of Epoch AI’s capabilities index. The headline finding, which has already been quite clear from the model cards (or recent SecureBio papers), is that frontier models have now surpassed human-expert baselines across biological tasks, and the curve is still climbing (but will become very hard to evaluate).
The dashboard pulls together the benchmarks that SecureBio developed — the Virology Capabilities Test, ABC-Bench, and BioTIER (which measures how well safeguards actually hold) — under one standardized pipeline so you can compare models head-to-head. This is a great follow up to their article last week.
4. The biggest IPO in history
SpaceX raised $75 billion, “the world’s biggest IPO.” Elon Musk is now not just the richest man alive, but is about 3x as rich as the next runner-up.
Whatever investors might think, Elon Musk definitely does not think of SpaceX as merely a rocket company; he is intends to be “doing AI data centres in space.” Goldman Sachs is projecting $322 billion in AI revenue for SpaceX by 2030, and Google has reportedly agreed to pay SpaceX $920 million a month for GPU access, so I don’t know how the numbers break down but clearly SpaceX is an AI infrastructure company. This IPO is also a warm-up for the “real” AI companies: it sets the stage for Anthropic (reportedly targeting a $1 trillion valuation) and OpenAI (confidential S-1 already filed) to list later this year. The bubble spector is always looming; I like how The Economist put it: can the market swallow all three of these “giga-IPOs” in one year? Maybe yes, they say, with some indigestion.
5. The states come for OpenAI
In the absence of federal AI regulation, the state attorneys general have decided to fill the vacuum, and this week they came for OpenAI in force. A multi-state coalition including New York and Colorado has subpoenaed OpenAI over its data handling, its minor-safety practices, and its advertising. Florida went further and became the first state to sue, alleging that ChatGPT caused a suicide, enabled a mass shooter, and harmed minors’ critical-thinking ability. Florida’s governor DeSantis has been positioning himself as an AI-safety kinda guy (for the good of ‘traditional family values’ or some such), and the Florida AG separately opened a criminal investigation after messages the FSU shooting suspect exchanged with the chatbot surfaced.
Meanwhile, the same article mentions California’s AG is investigating x.AI over non-consensual sexualized imagery, and Kentucky has sued Character.AI. This is all besides for the actual state law regulations as California’s SB-53, New York, and Illonois. After a year of the federal government oscillating between deregulation and the occasional export-control airstrike, the real accountability action has been migrating to the states.
In other news...
On AI doing (or not doing) things:
Tim Lee notes Anthropic has caught up to OpenAI on image understanding — though both models still have “geometric reasoning capabilities on par with young children.”
A Glean/Work AI Institute survey of 6,000 workers, via the FT, finds AI saves 11 hours a week but only 13% see actual company performance improvement, partly because workers spend 6.4 hours a week “botsitting” (monitoring and fixing AI output). I’m actually surprised that the “company performance improvement” is as high as 13%!
“AI as Normal Technology” folks claim that software-engineer employment keeps growing despite the layoff PR. There are a lot of conflicting claims/evidence here so I’d want to look into this more.
Hallucinations: the FT caught KPMG publishing a report on “agentic AI” stuffed with fabricated client case studies.
AI safety and alignment:
Anthropic’s next-gen Constitutional Classifiers -- we’re gonna need these or everyone will just switch to less annoying models and nobody is better off.
Following up on METR’s time horizons graph, Redwood Research measured how long models can reason without emitting any chain-of-thought — i.e., hidden reasoning. The no-CoT time horizon has doubled every 373 days since 2019; GPT-5.5 now handles ~3 minutes of equivalent human effort silently, projected to ~25 minutes by 2030.
California’s SB 53 extends whistleblower protection to employees reporting catastrophic AI risk, not just illegal acts.
AI and society:
Excellent investigation piece here by Andy Masley on Memes > Doom: 25,000 TikTok/YouTube videos show the public’s AI conversation (memes, productivity, art theft) barely overlaps with the elite one (x-risk, data centers).
Apple’s new AI Siri (built with Google) won’t ship in the EU because the Digital Markets Act’s interoperability rules would, Apple says, force it to give rival assistants “nearly unlimited access to a user’s device.”
“Five Hour Workweek” guy Tim Ferriss claims that AI is killing nonfiction based on his own personal data: his catalog revenue fell 46% in 2025 and is tracking toward -57% in 2026, with the self-help category down 26% in Q1. I think he might be right that self-help books are going to be harder to sell these days... but I have no doubt that the same self-help gurus will just find another way, such as by making their own chatbot character avaters to give people “perosnalized advice on fixing up your life” or whatever.
The FT on some unlikely corporate winners of AI: Caterpillar, Nucor, 150-year-old Hochtief, and Ford
Another from FT: the editorial board pairs Pope Leo XIV’s new AI encyclical which emphatically with Argentina’s creation of a “non-human corporation” legal category. Yuval Noah Harari comes out a little closer to the side of the Pope, but it’s worth investigating the history of corporations a little more than gesturing at the Dutch East India Company.
Asterisk hosts Ajeya Cotra and Tim Lee on how long until AI doesn’t need humans: “more likely than not within 10 years” vs “<10% within 20.”
AI for biology:
Anthropic’s agents-in-biology post is a really great read on how AI agents are doing biology for real, and what kind of scaffolding is needed to push them to do better. We need more data! To the (wet) lab! But it also needs to be cleaner and and better annotated!
Abhishaike Mahajan has a characteristically fantastic long read on how to build a cancer vaccine (and its forty years of failure). BioNTech’s pancreatic-cancer vaccine kept 8 of 16 responders recurrence-free at six years. “For the first time, the underlying machinery is plausibly mature enough.”
On the other hand, Liang Chang has a good take on an “anti-scaling law” in biology: AI may speed up execution against known targets but can’t discovering novel targets, which still takes sequencing hundreds of thousands of genomes (Regeneron needed 640,000 exomes to find one protective variant). I appreciate the skepticism, but (1) this is just a different piece of the bio-discovery pipeline, the “genomics” (or exome/expression/etc) questions, one where AI does show promise especially with more and cleaner datasets (2) we may be moving towards more personalized medicine, where new discoveries matter less if we can get point-of-care drug targets within a few days. I definitely appreciate the “crowding” problem though; right now there are too many candidates for too few targets.
Three new computational-biology tools from Kiin Bio’s roundup: Duke’s mRNAutilus (400-fold expression boost), ASU’s EpiFormer (epitope prediction, +40% F1), and Seoul National’s Folddisco (indexing 53 million protein structures for motif search in seconds).
There have been a huge number of studies over the past decade already looking at using RAG for health/medicine chatbots to improve their accuracy, with mixed results. A BMC Health Services Research systematic review of 44 studies finds retrieval-augmented generation does cut healthcare-AI hallucinations 30–50%, which I think are similar numbers to non-medicine contexts.
Biosecurity generally:
Did you hear that outgoing DNI Tulsi Gabbard had declassified slides that show United States involved in bioweapons development??? Hopefully not, because all that is crazy false; glad she’s out of this position.
A Health Security workshop paper by David Gillum and colleagues on the governance gaps in high-risk biological research.
The Pandora Report tracks the New World Screwworm’s first US reemergence since the 1960s — six confirmed cases in Texas and New Mexico, USDA deploying sterile-insect technique and 8,000+ traps — and an open letter from 100+ leaders demanding pandemic-prevention action ahead of September’s UN High-Level Meeting. (See also the Independent Panel’s blunt open letter: “Enough. It is time to get deadly serious.”)
Eryney Marrogi argues that stricter requirements on DNA synthesis (discussed last week) are more of “security theater” and will harm smaller companies/competition without making anyone safer. I don’t think he’s correct, but I should admit that this is not obvious! People have indeed looked into it and the math seems to checks out in favor of screening but cost-benefit calculations here are hard.
[drafted with some help from Claude Sonnet]


