Published Date : 25/06/2025
Nearly 25 years after scientists completed a draft human genome sequence, many of its 3.1 billion letters remain a puzzle. The 98% of the genome that is not made of protein-coding genes, but which can influence their activity, is especially perplexing. An artificial intelligence (AI) model developed by Google DeepMind in London could help scientists make sense of this ‘dark matter’ and understand its role in diseases such as cancer and cellular functions. The model, called AlphaGenome, is described in a preprint released on June 25th.
This is one of the most fundamental problems not just in biology, but in all of science, according to Pushmeet Kohli, the company’s head of AI for science. The ‘sequence to function’ model takes long stretches of DNA and predicts various properties, such as the expression levels of the genes they contain and how those levels could be affected by mutations.
“I think it is an exciting leap forward,” says Anshul Kundaje, a computational genomicist at Stanford University in Palo Alto, California, who has had early access to AlphaGenome. “It is a genuine improvement in pretty much all current state-of-the-art sequence-to-function models.”
When DeepMind unveiled AlphaFold 2 in 2020, it made significant strides in solving a problem that had challenged researchers for decades: determining how a protein’s sequence contributes to its three-dimensional shape. Working out what DNA sequences do is different because there is no one answer, as in a 3D structure that AlphaFold delivers. A single DNA stretch will have numerous, interconnected roles, from attracting cellular machinery to turn a nearby gene into an RNA molecule to influencing where, when, and to what extent gene expression occurs. Many DNA sequences influence gene activity by altering a chromosome’s 3D shape, either restricting or easing access for the machinery that does the transcription.
Biologists have been working on this question for decades using various computational tools. In the last decade, scientists have developed dozens of AI models to make sense of the genome. Many of these have focused on individual tasks, such as predicting levels of gene expression or determining how exons, the modular segments of individual genes, are cut and pasted into distinct proteins. However, scientists are increasingly interested in ‘all in one’ tools for interpreting DNA sequences.
AlphaGenome is one such model. It can take inputs of up to one million DNA letters, a stretch that could include a gene and myriad regulatory elements, and make thousands of predictions about numerous biological properties. In many cases, AlphaGenome’s predictions are sensitive to single-DNA-letter changes, meaning that scientists can predict the consequences of mutations.
In one example, DeepMind researchers applied the AlphaGenome model to diverse mutations identified in previous studies in people with a type of leukemia. The model accurately predicted that the non-coding mutations indirectly activated a nearby gene that is a common driver of this cancer.
Despite its advancements, AlphaGenome is still limited. While it provides valuable insights, more research is needed to fully understand and apply its predictions in real-world scenarios. However, the potential impact of this technology on medical research and treatment is significant, offering a new tool for scientists to explore the complexities of the human genome.
Q: What is the 'dark matter' in DNA?
A: The 'dark matter' in DNA refers to the 98% of the human genome that does not code for proteins but plays a crucial role in regulating gene expression and cellular functions.
Q: What is AlphaGenome?
A: AlphaGenome is an AI model developed by Google DeepMind that predicts various biological properties of DNA sequences, helping scientists understand the non-coding regions of the genome.
Q: How does AlphaGenome work?
A: AlphaGenome takes long stretches of DNA and predicts properties such as gene expression levels and the effects of mutations, providing insights into the complex roles of non-coding DNA.
Q: What are some potential applications of AlphaGenome?
A: AlphaGenome can help in understanding the genetic basis of diseases, predicting the consequences of mutations, and improving our knowledge of cellular processes, potentially leading to new treatments and therapies.
Q: What are the limitations of AlphaGenome?
A: While AlphaGenome provides valuable insights, it is still limited in its ability to fully interpret all aspects of DNA sequences. More research is needed to validate and apply its predictions in real-world scenarios.