Researchers from EPFL critically evaluated the performance of advanced artificial intelligence models against simpler statistical approaches in predicting cellular responses to genetic perturbations. Analyzing data from ten distinct experiments, they discovered that simple methods often matched or outperformed sophisticated AI, raising questions about the validity of current evaluation metrics. This work, published in Nature Biotechnology, highlights the influence of experimental design biases on model accuracy and introduces Systema, a tool developed to better gauge predictive performance in genetic studies.