SpatialBench, a new benchmark suite, evaluates agent‑based tools on realistic spatial biology analysis workflows and reveals performance gaps between demo scenarios and deployment‑grade tasks. The authors assembled 146 verifiable problems across multiple spatial platforms (Vizgen MERFISH, 10x Visium, Xenium, Atlasxomics) to test QC, clustering, cell typing and spatial analyses. Results show that current agents often fail on messy, context‑dependent datasets and that progress requires biologically grounded benchmarks. The resource aims to steer tool development toward reproducible, deployment‑ready analytics for spatial assays.