Illumina launched the Billion Cell Atlas, a genome‑wide genetic perturbation dataset intended to accelerate AI-driven drug discovery. The company said AstraZeneca, Merck and Eli Lilly are founding participants; Illumina plans a five‑billion‑cell effort over three years. Tahoe Therapeutics, the Arc Institute and Biohub announced a separate, large perturbation-rich single‑cell dataset commitment to fuel virtual cell models and open‑source resources. Both initiatives center on large, perturbation‑heavy single‑cell data to train and validate AI models for target validation and virtual cell simulation. Illumina positioned its Atlas as an industrial resource for pharma model training; Tahoe/Arc/Biohub aim to expand diversity of perturbations and patient contexts to improve model generalizability. Large pharmaceutical partners and multi‑institution commitments suggest commercial and open data strategies will coexist as companies race to provide the foundational datasets for AI in discovery. For readers: a “virtual cell” is an AI model trained on single‑cell transcriptomic responses to perturbations to predict how cells respond to genetic or chemical interventions. Expect near‑term use cases in target triage, mechanism validation and preclinical prioritization and longer‑term work toward in silico efficacy prediction.
Get the Daily Brief