Two high‑impact AI initiatives this week signalled advancement in computational biology infrastructure. Researchers published a multimodal foundation model tuned for whole‑slide pathology, integrating image and knowledge embeddings to accelerate biomarker discovery and diagnostic workflows. The Nature Communications paper demonstrated improved whole‑slide representation and potential for translation into clinical pathology pipelines. Separately, Parse Biosciences and Tahoe Therapeutics announced a collaboration to generate a 300 million cell perturbation atlas to train AI models for therapeutic prediction. The dataset aims to underpin virtual cell models that predict responses to genetic and chemical perturbations across tissues and disease states. Both projects illustrate growing investment in large, standardized biological datasets and foundational models that drug developers intend to use for target discovery and preclinical prediction. The moves highlight a shift toward ‘data as infrastructure’ in drug discovery, with implications for reproducibility, model validation and regulatory acceptance of AI‑driven biomarkers.