The following is a summary of “Use of Natural Language Processing to Extract and Classify Papillary Thyroid Cancer Features From Surgical Pathology Reports,” published in the August 2024 issue of Endocrinology by Loor-Torres et al.
This study aims to harness the power of Natural Language Processing (NLP) to automate the extraction and classification of risk factors associated with thyroid cancer from pathology reports. Researchers conducted a comprehensive analysis of 1,410 surgical pathology reports from adult patients diagnosed with papillary thyroid cancer between 2010 and 2019. The dataset comprised both structured and unstructured reports, enabling the creation of a consensus-based ground truth dictionary that categorized the data into modified recurrence risk levels. Structured reports adhered to standardized formats, whereas unstructured reports presented information in narrative form.
To facilitate the extraction and classification of thyroid cancer features into risk categories, the study group developed ThyroPath, a rule-based NLP pipeline. The training phase involved 225 reports, comprising 150 structured and 75 unstructured samples, while testing was conducted on 170 reports, with 120 structured and 50 unstructured samples, to evaluate performance. The efficacy of the pipeline was assessed using both strict and lenient criteria for accuracy, precision, recall, and the F1-score, a metric that integrates precision and recall into a single measurement. ThyroPath excelled in extraction tasks, achieving overall strict F1 scores of 93% for structured reports and 90% for unstructured reports, successfully covering 18 distinct pathology features related to thyroid cancer. In classification tasks, the extracted information demonstrated an impressive overall accuracy of 93% in categorizing reports according to their guideline-based recurrence risk levels, with specific accuracies of 76.9% for high-risk, 86.8% for intermediate risk, and 100% for both low and very low-risk categories.
Notably, ThyroPath achieved 100% accuracy across all risk categories when utilizing human-extracted pathology information as a benchmark. These findings indicate that ThyroPath holds significant promise for automating the extraction and classification of thyroid pathology reports on a large scale, effectively addressing the labor-intensive nature of manual reviews and facilitating the development of virtual registries. Nonetheless, further validation is required to ensure the robustness and reliability of this tool before its widespread implementation in clinical practice.
Source: sciencedirect.com/science/article/abs/pii/S1530891X24006578