New research is showing that natural language processing can outperform ICD-10 coding for capturing clinically relevant details from healthcare records. The finding highlights a shift in how organizations may extract usable clinical signals from text—especially when codes miss context or granularity. The work points to improved clinical-data understanding for downstream applications such as patient stratification, real-world evidence generation, and more precise cohort building. The key change is methodological: rather than relying solely on standardized billing codes, NLP can capture the meaning embedded in clinician documentation. For biotech, this matters because high-quality labels drive better modeling for trial matching, safety monitoring, and biomarker discovery. If validated across health systems, NLP-based extraction could materially change how sponsors build datasets. Key industry takeaway: NLP is challenging the dominance of ICD-10 as a data capture layer for clinical analytics.