SpatialBench introduced a suite of 146 verifiable problems drawn from real spatial biology workflows to evaluate AI agents on tasks ranging from QC to spatial analysis across platforms including Vizgen MERFISH and 10x Visium. The paper argues that prior benchmarks rewarded textbook answers rather than robust, reproducible analysis of messy experimental data. Spatial assays measure molecular signals with spatial coordinates; the benchmark aims to close the demo‑to‑deployment gap by grading agents on concrete, reproducible outcomes relevant to lab decision‑making.