SpatialBench, a new benchmark suite, evaluates computational agents on 146 verifiable spatial biology problems drawn from real analysis workflows across five platforms (Vizgen MERFISH, 10x Visium, Xenium, Takara Seeker, Atlasxomics). The project aims to close the demo‑to‑deployment gap by grading agents on tasks scientists actually perform, from QC to spatial differential expression. The authors stress that current agent demos overfit textbook cases and advocate for benchmarks tied to reproducible biological outcomes. SpatialBench provides standardized problems, ground truths, and failure modes so labs and tool builders can measure progress, reduce downstream experimental risk, and accelerate adoption of automation in spatial assays.