Researchers introduced SpatialBench, a suite of 146 verifiable problems drawn from authentic spatial biology workflows to benchmark AI agents on real analysis tasks. The dataset spans five experimental platforms (MERFISH, Seeker, Visium, Xenium, DBIT‑seq) and seven task categories—QC, normalization, dimensionality reduction, clustering, cell typing, differential expression and spatial analysis—aiming to close the “demo‑to‑deployment” gap for agents. SpatialBench is presented as a reproducible yardstick so labs and vendors can quantify whether an agent truly produces reliable, verifiable biological results rather than polished demonstrations. For readers: an “agent” here refers to an AI system designed to perform end‑to‑end data analysis and decision tasks in computational biology.