SpatialBench, a new benchmark suite introduced by software and computational biology researchers, revealed major weaknesses in current agent‑based tools when applied to real spatial biology workflows. The benchmark comprises 146 verifiable problems spanning platforms like MERFISH and 10x Visium and tasks such as QC, cell typing and spatial analysis, designed to test agents on the messy, decision‑point work scientists actually perform. Authors showed that many models perform well on curated demos but fail on real, noisy datasets where biological validation is required. SpatialBench provides a reproducible metric set for developers and labs to evaluate progress toward deployable analysis agents and sets a higher bar for tools intended to replace specialist bioinformatics labor.