In her 2018 Nobel Lecture, Berkeley alum and Nobel laureate Frances Arnold, Ph.D. ’85, remarked that “today, we can for all practical purposes read, write, and edit any sequence of DNA, but we cannot compose it.” Referencing CRISPR, the gene-editing tool developed by Berkeley’s Jennifer Doudna, who won the 2020 Nobel Prize, Arnold added, “maybe we can cut and paste pieces from nature’s compositions, but we do not know how to write the bars for a single enzymatic passage.” Only evolution could do that.
Until now.
In February 2024, scientists from the Arc Institute, a nonprofit research organization founded in collaboration with Berkeley, UC San Francisco, and Stanford, unveiled Evo, an artificial intelligence model trained on billions of lines of genetic code that could “interpret and generate genomic sequences at a vast scale,” effectively writing whole genomes for single-celled organisms. Some called it ChatGPT for DNA.
As if that wasn’t big enough news, a year later the tool got a major upgrade thanks to a new partnership with chipmaker Nvidia. Evo 2 goes beyond reading and writing genomes for simpler organisms to doing the same for humans, plants, and other eukaryotic life. The tool could have immediate applications for health care; among other capabilities, Evo 2 has demonstrated that it can effectively predict whether specific mutations in a breast cancer–associated gene are likely to cause disease or be benign in human patients.
Evo 2 is the largest publicly available AI model for biology, according to the researchers, and is fully open source. “We want to empower the research community to build on top of these foundation models,” Dr. Patrick Hsu ’10, Arc Institute cofounder and assistant professor of bioengineering at Berkeley, recently told GEN Biotechnology. “We’re really looking forward to how scientists and engineers build on this ‘app store’ for biology.”