Tahoe, Arc and Biohub join to build massive perturbation-rich virtual‑cell dataset

Tahoe Therapeutics, the Chan‑Zuckerberg Biohub and the Arc Institute committed multi‑million dollar resources to create an open-source perturbation dataset designed to train next-generation virtual cell models. The project will generate more than 120 million single-cell datapoints across roughly 225,000 perturbations using Tahoe’s Mosaic technology and other platforms. The dataset will span ~50 cell lines, ~1,400 chemical scaffolds at three doses each, plus 100 cytokine perturbations, and will include patient-relevant metadata. Partners will have an exclusive access period before public release. The collaboration is positioned to accelerate AI model generalization in drug discovery and to provide a shared benchmark for the field.

Get the Daily Brief

Tahoe, Arc and Biohub join to build massive perturbation-rich virtual‑cell dataset