Five Things: Jan 25, 2026
Claude's Constitution, paper on AIxBiosecurity, Davos speeches, contra METR, LLM creativity test
Five things that happened/were publicized this past week in the worlds of biosecurity and AI/tech:
Anthropic shares “Claude’s Constitution”
Paper published on AIxBio threats
Davos: AI bigwigs feel compelled to race to AGI
Scathing takedown of the highly influential METR graph
Evaluating the creativity of LLM language output
1. Claude bears its soul constitution
Back in November when Anthropic released Claude 4.5 Opus, users started noticing that if the model was asked about itself, it would sometimes reference a “soul document” or “soul overview” that supposedly shaped its personality. Richard Weiss managed to extract what appeared to be a ~14,000 token document; this was covered by Simon Willison and Zvi Mowshowitz then quickly spread around AI-obsessed internet spaces like LessWrong and Hacker News.
Amanda Askell, philosopher at Anthropic who is (probably) primarily responsible for this, confirmed pretty quickly that “this is based on a real document and we did train Claude on it, including in SL [supervised learning].” She noted it became “endearingly known as the ‘soul doc’ internally, which Claude clearly picked up on.” The extracted version wasn’t perfectly accurate, but was “pretty faithful to the underlying document.”
This week, Anthropic officially released what they’re calling Claude’s “Constitution”—an 84-page, 23,000-word document. You can read the whole thing here, and wow is it interesting. Some of the media coverage on this includes The Register, TIME and TechCrunch. But my personal favorite discussion came from the Lawfare podcast Scaling Laws. As they point out, “Constitution” is really not the best word for what this is; they did not mention the previous talk of soul-document but I really think that fits well. Anthropic definitely has a special view on how to think about LLMs and AI, with some really profound implications. Definitely worth a listen (even if the podcast hosts forgot Asimov’s three laws of robotics; at least they are properly ashamed of themselves for it).
2. RAND researchers make AI x Biosecurity suggestions
Important-looking paper from researchers associated with RAND on how to mitigate the risk of AI-enabled biothreats. I hope to do a more thorough discussion of this and several similar papers (so much to do, so little time) but for now I’ll say that it looks like the most comprehensive discussion I have yet seen on this topic. They discuss general-purpose AIs but also have a lot to say on Biological Design Tools (BDTs) that could “raise the ceiling,” to use Jonas Sandbrink’s phrase, to create new bespoke bioweapons that present ever more dangerous risks than previously known ones.
This paper does fairly comprehensive literature review and expert consultation to compile 58 mitigation options - and then ranked and evaluated this options. Despite the stuffy language of academic articles, the paper is thus very opinionated! I’m guessing the authors are open to disagreements about their specific suggestions, especially considering that, like in investing, “past performance is not indicative of future results” when it comes to ever-more-powerful AI models.
I’ll just share one very high-level thought about the paper: I think it is especially interesting to compare with the Frontier Model Forum’s taxonomy of AI-biological risk (related to their work here and here) which focuses more on what “frontier model developers have committed to” - as in, what are these companies actually doing (or at least claim to actually be doing) in order to mitigate AI-assisted biological attacks. The FMF brief explicitly notes it “does not make claims about an ideal combination of safeguard techniques based on their effectiveness,” and as such, the RAND-associated paper makes a good compliment for what such evaluations/recommendations can look like.
3. Disturbances at Davos
I’m going to ignore the main attraction/distraction at the World Economic Forum this week at Davos; that’s not my beat. But as Euronews put it, AI “infiltrated nearly every conversation at Davos 2026” even if very little new information was learned. From my very-outsider perspective, the most direct (and wildest) session at Davos was “The Day After AGI” which included AI’s most influential builders on stage together: Demis Hassabis (of Google DeepMind) and Dario Amodei (Anthropic). For anyone paying attention, these are the two real AGI companies racing ahead as X.ai and OpenAI are floundering (at least, during the current news cycle; but who knows what will be the case tomorrow).
The the conversation between Hassabis and Amodei surrounded timelines: how long until AI can do basically everything? The answer from both of them: soon; Demis is less ambitious thinking that will take a whole 5 to 10 years. [before AGI]. This “debate” on timelines was the focus of a lot of the media coverage, but I don’t understand why; whether AGI comes in 2 years or 10 years, that is still a giant f—ing big deal. The AI leaders are not happy about this, as Transformer writes:
But while Hassabis and Amodei admitted that a slowdown would be preferable, they also insisted that they can’t do it alone. Such a pause would require “international collaboration,” Hassabis said, with Amodei noting that “it’s very hard to have an enforceable agreement where they slow down and we slow down.”
The message was one of helplessness. The executives freely admit to being trapped in a prisoner’s dilemma: each would prefer to slow down, but none will do so unilaterally. They are crying out for someone — anyone — to step in and help them.
So basically the biggest AI companies are involved in a game of chicken, where they are speeding along the road on a course that would result in negative a bazillion points for the whole human population.
At least Yann LeCun completely disagrees, calling AGI predictions a “complete delusion.” But, as everyone reading this probably knows, his opinion is one-against-two among the three men considered the godfathers of AI.
While certain powerful world leaders made total fools of themselves, Canadian PM Mark Carney was clearly taking things seriously: “Let me be direct: We are in the midst of a rupture, not a transition.” He warned that “the multilateral institutions on which the middle powers have relied—the WTO, the UN, the COP—the very architecture of collective problem solving are under threat.” The solution to the game of chicken or prisoner’s dilemma would be international cooperation, but as Shakeel Hashim and Celia Ford end off in the aforementioned Transformer article, “the prospect of international coordination feels increasingly like a pipe dream.”
4. Against the METR graph
Nathan Witkin at Transformer has written a scathing and detailed critique of one of the most influential and widely-cited metrics of AI capabilities: the METR graph. METR Long Tasks benchmark has been widely cited as evidence that AI is rapidly approaching human-level coding ability, but Witkin is unsparingly critical of its methodology. I’ll admit I’ve never looked under the hood at what is included in the METR benchmark, and I’m so glad that someone else did. The details include the fact that about a third of tasks had publicly available solutions (!!!) and when tested on messier, more realistic tasks, no model exceeded 30% success rates.
I hope (and expect) that this will get a lot of attention, even more than Future House’s takedown of Humanity’s Last Exam published last summer.
5. AI might do it different
A recent paper (whose authors include Yoshua Bengio) looks to provide a framework for benchmarking LLM creativity. They compare the standard AI chatbots to 100,000 human samples (!) on how well they do on a creative task: generate 10 words that are as semantically distant from each other as possible. They call this a DAT, and the idea behind it is that more creative people will cover a larger semantic space (in theory; of course, this is just a narrow kind of creativity but it is one in which the LLM has a good chance for competing). They also test a few creative writing tasks (haikus, movie synopses, flash fiction) to evaluate the simpler DAT to show how well it correlates to actual creative writing.
Some key findings:
GPT-4 beats the human average and GeminiPro roughly matching human performance.
But more creative humans (above-median) still outperform every model tested.
LLMs are repetitive: GPT-4-turbo used the word “ocean” in over 90% of its responses. GPT-4 used “microscope” in 70% and “elephant” in 60%. Meanwhile, the most common human word was “car” at just 1.4%.
This is great, but also maybe unsurprising: LLMs converge on an “optimal” solution, basically a “right answer.” Does this mean that they are genuinely finding something correct, or gaming the test?
In other news...
On AI doing things:
People are still going crazy for Claude Code. As Matt Yglasias notes, “Everything you can do on a computer is a question of writing code.” Timothy B. Lee has a nice piece on why Claude Code is so great: Anthropic shifted responsibility from Claude to its users. This is a very good framing and an interesting choice from the AI company that claims to be the most responsible and safety-oriented.
Zvi Mowshowitz users sharing ChatGPT’s response to the primpt, “Create an image of how I treat you”; results are hilarious and highly disturbing. It matters very much how the request is framed and worded, which is… unsettling.
A new WSJ piece based on a survey of 5,000 white-collar workers reveals a massive perception gap: 40% of non-management employees say AI saves them no time at all, while 40%+ of C-suite executives claim it saves them 8+ hours per week.
GPT-5.2 Pro achieved 31% on FrontierMath Tier 4, up from the previous record of 19%. Epoch AI notes the model performed better on held-out problems than ones OpenAI had access to, suggesting genuine capability rather than memorization.
Relatedly, Luke Emberson at Epoch AI shows that benchmark scores correlate surprisingly well across domains (0.68 cross-domain vs 0.79 within-domain). Models good at math tend to be good at coding and reasoning, but these are broadly similar enough; it would be nice if they included more ‘squishy’ domains but those are by definition hard to evaluate (see above re: creativity tests). My question is always: will an “AI for Biology” outcompete a generalist LLM in actual biology research?
Peter McCrory, Anthropic’s Head of Economics, discusses their Economic Index report on Exponential View: “we understand the coal industry better than the AI economy right now.” Good to know that if you are confused, so are the experts.
Cool looking paper describing a large-scale, multimodal "mechanism dataset" for materials science seeking to capture the causal reasoning of science. This is hard; I hope that the authors will team up with more groups to put their fine-tuned model to the test.
On AI safety and governance:
Study finds that GPT-4o is equally effective at promoting and debunking conspiracy theories. Not so surprising, but gets especially concerning when paired with the “democracy swarms” Science paper citing research claiming that generative AI “tools can expand propaganda output without sacrificing credibility and inexpensively create falsehoods that are rated as more human-like than those written by humans.”1
Reuters reports that South Korea launches landmark laws to regulate AI. That’s all I know about it but it sounds important!
A collaborative paper from 22 researchers including Nick Bostrom, Gary Marcus, and Maria Ressa examines how malicious AI swarms could threaten democratic systems, now published in Science.
Dean W. Ball has a typically well though-out piece arguing that discussions about AI and children conflate different issues and many suggested regulations are bad ideas overall. I think there’s an obvious answer here - individual parents cannot be expected to take on the full burden of AI safety/guidance, so this is a great opportunity for parent-trusted third party evaluators! C’mon EdTech, get on this! As Ball ends off - “there are 100-dollar bills lying all over the ground”
Jason Crawford at Roots of Progress makes a case for “defense in depth” in AI safety using fire safety history as the model. “Safety will be created gradually, incrementally, through multiple layers of defense.” This is a good spinoff from Richard Williamson’s Pandemic Prevention as Fire Fighting (even though Crawford doesn’t quote him!)2
Effective Altruism self-analysis (as usual): Will MacAskill and Guive Assadi have published a great challenge to one of longtermism’s core principles: Nick Bostrom’s “Maxipok” (maximize the probability of an OK outcome). Definitely good to help bringing EA-style thinking more inline with normal people’s assumptions; this has major implications for thinking about AI safety beyond existential risk.
On AI x Bio and healthcare:
A new RCT (randomized control trial) out of China published in Nature Medicine enrolled over two thousand patients across hospitals in western China to be randomly selected for getting to use a chatbot (build off GPT-4) to help prepare for a specialist consultation. The headline results: consultation times dropped by 28.7% (from 4.4 to 3.1 minutes on average, which is maybe not crazy impressive), and physicians reported dramatically better care coordination (113% increase in scores). My favorite part (see below re: AI for biology) is that an AI model that was fine-tuned on patient-doctor interactions performed worse. Big caveat: there was no record of how well (or not) this improved clinical outcomes, just process and satisfaction.
Jesse Johnson at Scaling Biotech asks how far we should trust LLMs in biotech applications. His findings: translation tasks (converting inputs to organized outputs) are LLMs’ sweet spot; recall suffers from bias and hallucinations; categorization is frustratingly unreliable (couldn’t get above 60-70% accuracy classifying biopharma companies). This broadly aligns with my views also, the question is still where can they be the most helpful, and we are all trying to figure that out.
[Coming hopefully soon-ish: an analysis of RAND studies on whether or not GenAI can provide non-biologists with the tools necessary to create bioweapons, focusing in on this one that was updated Dec 31, 2025 and now the latest paper discussed above]
Citations: M. Wack, C. Ehrett, D. Linvill, P. Warren, Proc. Natl. Acad. Sci. U.S.A. Nexus 4, (2025); A. R. Williams et al., PLOS ONE 20, e0317421 (2025).
Does anyone else constantly get mixed up between “Works in Progress”, “The Roots of Progress,” and the Institute for Progress? It’s almost as bad as Americans for Responsible Technology, Americans for Responsible Innovation, and Americans for Responsible Growth. Can someone please consolidate these organizations?

