Published Date : 27/06/2025
Yusuf Roohani, PhD, machine learning group lead at the Arc Institute, is among a team of researchers training artificial intelligence (AI) models with transcriptome data to predict how cell gene expression patterns change with different cell states. These so-called virtual cells could help researchers discover new drugs capable of shifting cells from “diseased” to “healthy” with fewer off-target effects to boost clinical success rates.
However, building a virtual cell is not an easy feat. “When you look at cells, they are living dynamic systems,” said Roohani in an interview with GEN Edge. “Cells are constantly in flux, they’re messy, and they’re dependent on the experiment.”
Virtual cell models must account for biological complexity, such as the cell type, genetic background, and cell context. In addition, many existing single-cell datasets are impacted by substantial technical noise, including limited reproducibility of perturbation effects across independent experiments, which diminishes model performance. Without standardized benchmarks and purpose-built datasets, the field has struggled to evaluate whether virtual cell models are capturing generalizable biological insights and not dataset-specific patterns.
In a step toward virtual cell benchmarking and acceleration, the Arc Institute has announced the inaugural “Virtual Cell Challenge,” a public competition, sponsored by Nvidia, 10x Genomics, and Ultima Genomics, with a grand prize worth $100,000 for the machine learning model that best predicts how cells will respond to genetic perturbations. The challenge is described in a new commentary published in Cell with Roohani as lead author.
The initiative follows in the footsteps of the Critical Assessment of protein Structure Prediction (CASP) competition, the biannual experiment that assesses the latest state-of-the-art models in structural biology. Patrick Hsu, PhD, co-founder and core investigator at Arc, emphasized that CASP competitions have transformed protein structure prediction over 25 years and ultimately enabled breakthroughs, such as the Nobel Prize-winning algorithm, AlphaFold. “We believe Arc can use the same approach to accelerate progress toward comprehensive virtual cells that could fundamentally change how we study biology and identify targets to better treat complex diseases,” said Hsu in a public release.
Emma Lundberg, PhD, associate professor at Stanford University and the co-director of the Human Protein Atlas, concurs that establishing benchmarks has been a key challenge for evaluating and comparing virtual cell models. “I expect that [Arc’s] challenge will help to align the community and accelerate the work toward performant and useful virtual cell models. Hopefully, it’s the first of many standardized challenges in this space,” she told GEN Edge.
Theofanis Karaletsos, senior director of AI at Chan Zuckerberg Initiative (CZI), is an active developer of the virtual cell who has pushed forward CZI’s recent models, such as scGenePT for single-cell perturbations, and TranscriptFormer for cross-species predictions. “At CZI, we’re focused on building cutting-edge models and providing standardized evaluation frameworks to deepen the scientific community’s understanding of cells,” Karaletsos told GEN Edge. “Community benchmarks are important, and we believe open competitions like Arc’s are a powerful mechanism to accelerate innovation and collective progress.”
A Palo Alto-based non-profit research organization, the Arc Institute was founded in 2021 by Hsu and Silvana Konermann, PhD, assistant professor of biochemistry at Stanford University and Arc’s current executive director. Since that time, the institute has been known to make big bets on data-driven AI. In collaboration with Nvidia earlier this year, Arc released what they described at the time as the largest publicly available AI model for biology, Evo 2.
As a key challenge for AI models is making predictions outside of the training data, the Arc competition will evaluate how well competing virtual cells can predict changes in gene activity when generalizing to a new cell context. For the inaugural competition, Arc has generated a new single-cell transcriptomics dataset of 300,000 H1 human embryonic stem cells (H1 hESCs) with 300 genetic perturbations, which will be deployed throughout the competition in segments for fine-tuning, validation, and testing.
Models will be evaluated on the following three metrics: 1) performance in predicting differentially expressed genes; 2) performance discriminating between different perturbation effects; and 3) general error in terms of deviation from expression counts. The interim performance of competitor models will be shared on a live leaderboard during the middle phase of the competition. The three teams with the top models will receive prizes valued at $100,000, $50,000, and $25,000, combining cash awards and NVIDIA DGX Cloud credits.
Registration for the competition is now open. Individual contributors as well as teams from academic institutions, biotechnology companies, and independent research organizations are eligible to participate. Final rankings will be determined solely by model performance on the final test set, which will be released in late October, one week prior to the final submission deadline. Winners will be announced in December.
As a baseline, Virtual Cell Challenge competitors will initially go head-to-head with Arc’s first virtual cell model, STATE, which is designed to predict how various stem cells, cancer cells, and immune cells respond to drugs, cytokines, or genetic perturbations. STATE was released earlier this week for non-commercial use and is described in a preprint posted on Arc’s website that has not yet been peer reviewed. According to the authors, STATE improved discrimination of perturbation effects on multiple large datasets by over 50% and identified differentially expressed genes across genetic, signaling, and environmental perturbations.
Q: What is the Virtual Cell Challenge?
A: The Virtual Cell Challenge is a public competition sponsored by Nvidia, 10x Genomics, and Ultima Genomics, aimed at accelerating the development of AI models that predict cell responses to genetic perturbations. The competition offers a grand prize of $100,000.
Q: Who can participate in the Virtual Cell Challenge?
A: Individual contributors as well as teams from academic institutions, biotechnology companies, and independent research organizations are eligible to participate in the Virtual Cell Challenge.
Q: What is the main goal of the Virtual Cell Challenge?
A: The main goal of the Virtual Cell Challenge is to develop AI models that can accurately predict how cells will respond to genetic perturbations, which could help in the discovery of new drugs and treatments for complex diseases.
Q: What are the evaluation metrics for the competition?
A: Competing models will be evaluated on three metrics: 1) performance in predicting differentially expressed genes; 2) performance discriminating between different perturbation effects; and 3) general error in terms of deviation from expression counts.
Q: What is the significance of the Arc Institute in this challenge?
A: The Arc Institute, a non-profit research organization, is the organizer of the Virtual Cell Challenge. Founded in 2021, the institute is known for its focus on data-driven AI and has previously collaborated with Nvidia to release the largest publicly available AI model for biology, Evo 2.