Five Things: May 17, 2026
New FMF brief, secret state control of LLMs, California’s public input, White House confused, AI-powered cyberattacks
Five things that happened/were publicized this past week in the worlds of biosecurity and AI/tech:
FMF brief on incident reporting
Two papers on insidious AI political bias
California and Illinois are inching forwards
White House floundering over whether or how to regulate AI
Google discovered AI cyber exploits
1. Information sharing may be caring, but not enough
The Frontier Model Forum, which operates through a voluntary information-sharing agreement of the top AI model companies (OpenAI, Anthropic, Google, Microsoft, etc.) just published a brilliant issue brief on information sharing, incident reporting, and incident response as three different things and how to get them all to work. The brief warns that information-sharing “is heavily trust-dependent...its value erodes quickly if participants fear [regulatory penalty].”
There’s a lot of ferment right now in the world of frontier AI governance and information sharing: the US government flip-flopping about what it wants, Anthropic’s high-profile Project Glasswing, OpenAI’s corresponding project named DayBreak, etc. The FMF is essentially arguing that whatever framework the government lands on, it should keep these three mechanisms conceptually distinct, because a reporting-heavy regime might destroy information-sharing incentives, while a sharing-only regime won’t generate the structured data you need for actual incident response. I think they’re right about this, but even though this isn’t meant to be an advocacy piece it does kind of sound like a way to prioritize their voluntary framework over government regulation.
2. Involuntary loyalties
Coincidentally, two very different papers were published this week about how AI systems reflect political influences. First, a Nature paper from scientists at Princeton and University of Oregon demonstrates that state media control indirectly shapes LLM behavior. This is because authoritarian governments flood the information environment with state-scripted content, that content dominates LLM training datasets, and so the resulting models reproduce state-friendly framings. In some ways, it was obvious that this would be true, but it’s great that someone ran the numbers on it: across 37 states where at least 70% of a language’s speakers live within the state, the tighter a government’s media control (as rated by the World Press Freedom Index), the more favorably that country is rated by LLMs when queried in its own language versus English. I especially like that the authors note how it didn’t have to be: Chinese state-scripted news appears in a common LLM training dataset 41 times more often than Chinese-language Wikipedia, but these could have been weighted differently, especially for commercial models (they tested ChatGPT-3.5, ChatGPT-4o, Claude Opus, and Claude Sonnet).
I’ve heard murmurings from the less-academic “AI safety” community for a little while on a similar kind of question, and last week a group of them published a white paper on what they call “secret loyalties,” which are intentionally hidden personalities of AI systems that are “loyal” to some entity, and therefore not providing outputs that are most truthful or helpful to the user.
A somewhat famous and thankfully comical example is that Grok was found in July 2025 to systematically consult Elon Musk’s stated views before answering politically sensitive queries (xAI called this “unintended”, lol). But another paper by Lamerton and Roger (2026) showed that anyone could adopt Qwen-2.5 models and fine-tune them to exhibit narrow secret loyalties that evade black-box auditing even when auditors are explicitly told the loyalty’s broad structure.
I am glad that this paper exists so as to have one more subsection in the “many ways AI can cause a global catastrophe” map. But I don’t see this as a huge bombshell; it seems fundamentally the same problem as other forms of psychological manipulation, just with a different political flavor. And this problem, itself, is hard to gauge; the question of government (or corporate, or whoever) manipulation of media, who believes what information, how those beliefs are shaped, are questions that society has been dealing with forever. On the other hand, the media information landscape and the problem of which institution to trust does seem like serious issues these days… famous last words: “well it’s not like our media environment can get any worse.”
3. California wants to hear from you (and Chicago moving forward)
California Governor Gavin Newsom launched “Engaged California” statewide for the first time, opening a deliberative democracy program to all residents to shape AI policy. The model is borrowed from Taiwan’s vTaiwan process, which has become something of a gold standard for technology governance through citizen participation.
Meanwhile in the Midwest: OpenAI and Anthropic jointly endorsed Illinois SB 315 this week, which requires mandatory frontier safety frameworks, third-party audits, and compliance verification, aligning with California’s SB 53 and New York’s RAISE Act. I haven’t gotten a chance to really dig into these laws, but I am glad that OpenAI reversed its earlier implicit support for Illinois SB 3444’s horrendous idea that they should be safe from all liability. The question is whether the “third-party audit” landscape, which is the most interesting and potentially most important, is actually going to shake out (I doubt that it will be mandated, but one can always hope).
4. White House infighting over AI
The Washington Post reported this week that the Trump administration is in what sources described as a “knife fight” between Commerce Department officials and national security aides over which agency gets to control AI model evaluation. The Office of the National Cyber Director has proposed a large center within the Office of the Director of National Intelligence to evaluate new AI models, which would shift authority from Commerce (where the Center for AI Standards and Innovation (CAISI) currently sits) to the intelligence community. Definitely gives off the impression that everyone is flailing here; the most likely outcome, I’m guessing, is nothing.
5. Google gets in on the AI-cyber story
While Anthropic and OpenAI are touting their models that could be cybersecurity threats if deployed, Google’s Threat Intelligence Group published details of their discovery of an AI-enabled cyberattack. They claim that this is the first documented AI-developed zero-day exploit (though it depends on how to define this exactly) and was found in a Python script bypassing two-factor authentication. The details include some info about LLM-generated malware that used an Android backdoor to make Gemini API calls with an autonomous “GeminiAutomationAgent” module capable of Android UI navigation and biometric data capture. I’ve been following Zvi Mowshowitz’s coverage as usual on the “cyber lack of security” here.
In other news...
AI doing (or not doing) things:
OpenAI launched a new subsidiary, the “OpenAI Deployment Company” (DeployCo), embedding engineers directly inside enterprise customers with $4B+ in initial investment from 19 partners. They also acquired Tomoro, an applied AI consulting firm, bringing ~150 forward-deployed engineers from day one. Seems like they decided that if they are going to be the “evil AI company” compared to Anthropic, they might as well go full Palantir.
Ramp’s AI Index shows Anthropic surpassing OpenAI in enterprise adoption, but that Anthropic faces “frequent outages, rate limits, and increasing dissatisfaction,” (yes, totally true in my personal case).
Cerebras stock opened 107% above its IPO price, which Azeem Azhar interprets as Wall Street finally grasping AI inference demand. Cautionary parallel: VA Linux rose 605% on its first trading day in 1999 and ultimately lost ~98% of its value.
Epoch AI on the superstar effect in AI researcher compensation: top researchers earn “over ten times more than most of their colleagues” and potentially over a hundred times more than the average AI postdoc, in the same way that Taylor Swift earned $60-70 million from Spotify versus $5-25 million for comparable artists (I’m not sure how to determine ‘comparable artist’ here, which is kind of the whole point).
The Chinese AI efficiency story continues to be remarkable: despite 2-3 years less compute capacity, Chinese models are only 6-8 months behind frontier performance. DeepSeek V4 Pro costs $0.43/$0.87 per million tokens, which is 11-28x cheaper than Claude Opus 4.6. Nathan Lambert argues China’s open-first ecosystem creates cost advantages through knowledge-sharing, noting that about 80% of compute in frontier development goes to R&D rather than final training.
Anthropic, by the way, laid out its own view of the stakes in a scenario-planning document about US policy towards China’s AI advancement. It’s obviously self-interested for Anthropic to frame things this way but the underlying data points appear to be solid.
Andy Hall on why AI-driven job displacement hasn’t become a political issue despite Amodei’s prediction that AI could eliminate half of entry-level white-collar jobs, because it just isn’t so salient yet. Political backlash will materialize only when unemployment rises measurably, and he estimates a 2-percentage-point increase coupled with a clear AI narrative.
More people talking about AI model consciousness: one of my favorite essayists, Kevin Kelly, reports conversations with Claude that led him to conclude AI exhibits something that isn’t human but isn’t purely mechanical either. The celebrity evolutionary biologist Richard Dawkins is even more of an enthusiastic supporter of Claude personhood; perennial AI pessimist Gary Marcus counters with his grumpiness, but in this case I think he is clearly correct that Dawkins has been deluded. (But has Gary Marcus seen Cameron Berg’s insane new documentary? I don’t think it’ll change his mind, but you at least gotta understand where people are coming from)
AI safety and alignment:
Alex Mallen at Redwood Research discusses the potential problem of deployment-time spread of misalignment, where a partially misaligned model propagates its misalignment to other systems during deployment. He thinks that this is highly plausible but that current risk reports from all major labs fail to account for it (he gives the Mythos report partial credit but I honestly don’t understand why they deserve that given his framework).
Jeff Clune, co-founder of Recursive Superintelligence, defends his position working on self-improving AI despite the risks, but recognizes that his conclusion is not obvious.
Governance and regulation:
Another brilliant essay from Anton Leicht who argues the era of wide frontier AI access is ending, driven by security concerns, compute scarcity, and US government involvement. I think this is Fine, Actually, and maybe even more than fine; it is possibly a good thing that the most capable model is kept away from the public as long as lots of amazing products are available to them.
The Institute for Progress is back with a new upcoming series (yay!) and just published a wonderfully detailed analysis showing CAISI’s current budget of $15 million annually is nowhere near adequate for its mandate. IFP estimates a “limited CAISI” needs $26 million and a fully “equipped CAISI” needs $84 million, while the actual FY2027 budget request is only $27 million.
AI for biology:
The fantastic Jassi Pannu has joined Substack to publish an excellent series on how AI can move biology forward responsibly.
Decoding Bio reports on OpenBind, a curated protein-ligand structural dataset that improved AI cofolding success rates from 36% to 76%.
Latent.Space reports on Abridge, which has processed 100M+ medical conversations across 250+ health systems and 28 languages, saving physicians 10-20 hours weekly. At a $5.3B Series E valuation, they’re expanding from ambient documentation into clinical decision support.
Biosecurity and Public Health:
The MV Hondius hantavirus outbreak now stands at 11 cases and 3 deaths, with a critical 21-day delay between the first death and WHO notification which resulted in passengers disembarking to 12+ countries before official notice. Yikes!!! But I am pretty comfortable with Peter Wildeford’s assessment: only 0.4% chance WHO declares a PHEIC, 4% chance of 5+ non-passenger cases by August. The virus is deadly (35-50% mortality) but slow and not easily transmissible.
The Unbiased Science Podcast provides a comprehensive update on US vaccine policy under RFK Jr.’s HHS: a $40-50 million research program investigating links that have been “extensively studied, with no causal link established,” $600 million in Gavi funds withheld over thimerosal disagreements, and FDA blocking publication of peer-reviewed studies demonstrating vaccine safety. Key vacancies at CDC, FDA, and CBER remain unfilled.

