A pair of researchers led by Adrian Barnett (Queensland University of Technology) and Alexander Gibson have traced flawed datasets on Kaggle to clinical-model training papers and reported retractions. Their investigation focused on stroke and diabetes datasets containing duplicated or unrelated images, including “droopy” entries showing celebrities and mismatched medical content. The team documented the data lineage from open-source Kaggle repositories into peer-reviewed work, with multiple retractions already tied to preprint findings hosted on medRxiv. One dataset was removed from Kaggle, while the “droopy” set reportedly remains online. The reporting underscores the risk that poor data provenance can translate into automated clinical decision systems and reputational or regulatory damage for authors and platforms.