Earlier today I handed an AI my entire research library, 6,576 papers, and asked it a question I had been chewing on for years. Do the formal vocabularies that are supposed to encode my field actually capture how the people in it think?
I study how diseases spill from animals into humans. Spillover science, like most fields, has official ontologies, curated catalogs of the concepts and relationships meant to organize the work. My suspicion was that they were thin. So rather than argue the point, I tested it.
The system read the 490 papers in my collection on zoonotic spillover and pandemic emergence, pulled concepts and causal claims out of the full text, and assembled a vocabulary from the bottom up. Then it set that against the formal ontologies. The mismatch was significant. Of 915 relationships the literature uses repeatedly, 864 had no counterpart in the standard reference. Twelve hundred conceptual categories showed up in just 490 papers and nowhere in the formal schemes, clustered in environmental drivers and ecological processes. The working language in just a fraction of the literature of my field turned out to be about four times richer than its official one.
The project cost twenty-six dollars and change.
That experiment was my test drive of Claude Science, the tool Anthropic released today. It is the company’s bid to do for laboratory research what Claude Code did for software, and the ambition is not small. Six months ago, Zubair Jandali, who leads health care and life sciences at Anthropic, told an audience that Claude could help with the digital work of the life sciences. The pitch today was that it can run that work.
The Same Model, a New Harness
Begin with what Claude Science is not. It is not a new model. Anthropic is unusually plain about this: the product runs the same Claude everyone already uses, Opus 4.8 included, with no special access and no gating. The intelligence is the same intelligence anyone can already rent.
What is new is the harness around it. In artificial intelligence, a harness is the scaffolding that turns a general-purpose model into a working tool, the connections to data, the ability to run code, the memory of what it has done, the checks on its output. A model on its own can reason about a protein. A model in a good harness can pull the structure from a database, fold a variant on a cluster, render the result and keep a record of every step. Claude Science is a harness built for science, and a substantial one.
It wires in more than sixty scientific databases, ships with prebuilt skills for genomics, proteomics, structural biology and chemistry, renders protein structures and chemical diagrams inline, and manages computing jobs across a laptop, a cluster or rented GPUs. Every figure it produces carries its full history, the code, the computing environment and the conversation that generated it, bundled so the result can be regenerated later.
None of this makes the model smarter. It makes the model useful, which is the more valuable thing right now. A computational biologist with Claude Code and a GitHub account could assemble much of it herself, given a few weeks of wiring. Claude Science is the bet that doing this wiring once, properly, beats a thousand labs each doing it from scratch. The value is curation, and curation is what turns raw capability into science.
Built for the Bench, Open to the Field
The launch demo was a drug-discovery campaign. From a single sentence, Claude planned and ran a search for a molecule to stabilize the broken enzyme behind phenylketonuria, screened 2,200 compounds across 80 GPUs, narrowed them to four candidates and produced a go/no-go memo. Then it ran the same triage across 100 rare diseases at once. Why stop at 100, the presenter asked, when the same machinery could just as easily run 10,000?
It was an impressive performance, and it is entirely molecular. Every database, every prebuilt skill, every partner model points at the same kind of science: genes, proteins, small molecules, structures. OpenAI and Google have aimed their own scientific tools at the same target, pharmaceutical research, where the money is.
The rest of science, an enormous scope, is wide open. The earth and atmospheric sciences, environmental science, ecology, the social and behavioral sciences, much of epidemiology: none of it is configured yet, and the data these fields run on, biodiversity records, climate reanalysis, census files, remote sensing, are not among the sixty databases. The science that happens outside the wet lab, in the field, the watershed and the population, is the next frontier for a tool like this.
It is easy to picture. A harness like this one, pointed at my field, could pull species occurrence records from GBIF, lay them over climate reanalysis, fit a distribution model and flag the counties where a tick-borne pathogen is most likely to spread next season, then draft the surveillance brief and the figure to go with it. It could read every outbreak report from a region and reconstruct the transmission chain. It could do for a public health department what the demo did for a drug program, compressing weeks of assembly into an afternoon. The intelligence to do this already exists. What is missing is the harness, the connectors and the skills, and my twenty-six-dollar experiment is a small proof that it can be done.
The engine is general. Anthropic aimed the first harness at pharmaceutical research, where the urgency and the budgets are, but nothing in the technology confines it there. The field sciences are an invitation, not an oversight.
Where Scientists Come In
The technology is exciting, and it will accelerate a great deal of science. It also clarifies where scientists matter most. Once generation is cheap, the work that remains is judgment. Auditing, validation and correction are the de facto rate-limiting steps of virtually all work that runs through a screen, and that is simply a description of what good science always was. Claude Science even ships with a reviewer agent that flags bad citations and mismatched numbers. It is the same model checking its own work for now, not an independent source of truth, but the direction is right.
The question I find most exciting is novelty. A model trained on the existing literature is exquisitely good at reflecting that literature back, and the risk is a regression toward a mediocre mean, a field talking to itself. But the same tool that maps what a field already believes can expose its gaps, the relationships no one has tested, the concepts that appear everywhere and are defined nowhere. For twenty-six dollars I found the edges of what my field has already written down. Finding what it has not yet imagined is the harder and more exciting problem, and for the first time it looks like one we can actually take up.

