When the CEO of Invidia talks, decision-makers listen, from investors to business executives to Nobel Prize committees. Ever since Jensen Haung observed six months ago that “for the first time in human history” biology has an opportunity to move from scientific inquiry to engineering rigor, funding for life sciences and biotech companies in the U.S. has accelerated. The prime candidates for capitalizing on this opportunity are cell and gene therapies, using life’s building blocks to treat, prevent, or potentially cure diseases.

The 2024 Nobel Prizes illustrate this coming together of multiple disciplines–molecular biology, neuroscience, computer science, computational statistics, physics, artificial intelligence—to engineer a new healthcare landscape.

The winners of the physiology or medicine prize, Victor Ambros and Gary Ruvkun, were investigating how different cell types develop, leading them to the discovery of microRNA, a new class of tiny RNA molecules that play a crucial role in gene regulation. One half of the chemistry prize went to David Baker who developed computer software that could predict protein structures and then used it to reverse engineer the process: entering a desired protein structure and obtaining suggestions for its amino acid sequence which enabled the creation of entirely new proteins. The other half of the chemistry prize went to Demis Hassabis and John Jumper who used modern AI to predict the structure of all known 200 million proteins, providing researchers the tools to accomplish in minutes what before took months, even years. The winners of the physics prize, John Hopefield and Geoffrey Hinton, have paved the way to modern AI—machine learning using artificial neural networks—also known as deep learning.

“Biology is a very messy field ripe for a transformation into engineering,” says Jonathan Rosenfeld, co-founder and CTO of Somite.ai. Rosenfeld has already brought the predictive power of engineering into another messy field, albeit a much younger field, that of modern AI.

When you use a specific engineering method or process, you can typically predict its impact on performance, on the desired outcome. With deep learning, there is a general notion—the “scaling hypothesis”—that scaling or increasing the resources used, applying more data and more computing power, will lead to better results. But, says Rosenfeld, “Are we close to the limits? Are we progressing at the maximum rate that we can progress? Are we even going in the right direction? We had no idea as recently as five years ago.”

From 2017 to 2021, Rosenfeld pursued a PhD degree at MIT, pioneering AI scaling laws in 2019 and answering the question of how performance depends on data quantity and quality, the size and type of the model and the amount of computation deployed. Rosenfeld’s scaling laws have since been expanded by other pioneers, most notably OpenAI. AI scaling laws give the field the predictive power of engineering: You know in advance what you are going to get for the millions of dollars you invest in training a single model. “We can now predictably say what will improve performance and at what rate. We can answer the question ‘is this investment worthwhile?’” says Rosenfeld.

In addition, scaling laws and the resultant predictions lead to a better understanding of given results. It allows, says Rosenfeld, for asking—and answering—questions such as “Are we at the limit of efficiency? Could we do better? Could we advance faster? Do we understand what are the sources of the current limitations?”

After completing his PhD, Rosenfeld worked briefly at a couple of startups but was determined to launch his own, at the intersection of AI and biology. With Micha Breakstone, an accomplished AI entrepreneur, Rosenfeld founded Somite.ai while continuing his affiliation with MIT as the head of the FundamentalAI group. Joining Breakstone and Rosenfeld as co-founders were Olivier Pourquie, Professor of Genetics at the Harvard Medical School and of Pathology at Brigham and Women’s Hospital; Allon Klein, Professor of Systems Biology at the Harvard Medical School; Cliff Tabin, Professor and Chair of Genetics at the Harvard Medical School; and Jay Shendure, Professor of Genome Sciences at the University of Washington.

This dream team of leading experts in stem cell biology, medicine, genetics, engineering, statistics, computer science, and machine learning, has accomplished a lot in the first year of its “journey to transform stem cell biology into a compute-bound engineering challenge,” says Breakstone. It has received FDA Orphan Drug Designation and Rare Pediatric Disease Designation for its first program, targeting Duchenne muscular dystrophy, the most common hereditary neuromuscular disease (one in 3,500 male children are born with it). “It’s the first step towards a future where humans have spare parts—where we can control the creation of any cell type by mastering nature’s cellular programs,” adds Rosenfeld.

Somite.ai has raised so far over $10 million in a pre-seed round, including recent additional funding from Astellas Venture Management (wholly-owned by Astellas Pharma) and Montage Ventures. And it has launched the first phase of its AI platform, AlphaStem, and a second program, targeting metabolic disorders.

“We’ve been making marvelous strides in biology but AlphaFold is about just the building blocks,” says Rosenfeld. “If you look at the system level, such as the cell, it’s like a supercomputer that is running in parallel many programs. Understanding the language, the operating language of these cells and how they execute is an open problem that we are now at the cusp of being able to tackle with this intersection of AI—driven by data scaling—and biology’s domain expertise.”

AlphaFold was the result of applying novel machine learning and search algorithms to decades of accumulated data on proteins’ amino acid sequences and structures. Similarly, biologists can now actually see what is the state of a cell at any given time and collect this and other types of data they didn’t have before, adding to decades of accumulated genetic and system-level data.

Biologists today have the ability to map in great detail how cell differentiation evolves in an embryo. “This is an expensive process, but it has reached the point in time where it is not prohibitively expensive,” says Rosenfeld. What is missing is the programming language that controls this cell differentiation process, that guides the cell through all the branching points. What is needed is a navigation program for cell differentiation, identifying the required signals (e.g. chemicals or proteins) at any given point in time. “If we know how to orchestrate, to conduct this symphony, to master this signaling language,” says Rosenfeld, “we could produce at will any cell type.”

The team of experts at Somite.ai knows how to efficiently and accurately produce somites, the embryonic structures that are responsible for producing the musculoskeletal system and related tissues. Deciphering the language of how these cells grow into muscles, will allow it to develop cells that once injected into the human body, will repair it by growing the missing muscle tissue.

There are a number of companies today, such as Vertex and Novo Nordisk, that has demonstrated the first clinical successes in cell therapy. They have spent considerable time and resources to produce specific cell types, doing it in what Rosenfeld calls an “artisanal way.” Up until now, he says, “there hasn’t been a scalable way to produce any cell type at will. That is the gap that we are addressing.”

What is needed to do it “at scale,” is massive amount of signaling data and a model that predicts what will be the effect of a specific signal on the cell. Somite.ai has a method for economically producing large quantities of data, enough to allow its AI programs to learn on their own (self-supervised rather than being supervised by experts) and improve their performance. Experts are “slow and expensive and I want to do better than experts,” says Rosenfeld. “That is where search coupled with deep learning can make a big difference when you have many relevant data points that allow for self-supervised learning.”

The history of modern computing is ripe with examples of what reinforcement learning expert Rich Sutton has called “the bitter lesson,” that “the only thing that matters in the long run is the leveraging of computation.” Says Rosenfeld: “As compute resource and data scale, the human expertise that crafted very clever solutions in the data dearth and compute restricted setting is eclipsed. If we have enough data and we understand how to pose the problem well, we can make increasingly more progress than the best experts in any domain.”

Share.
Exit mobile version