Photo Credit: Anadmist
The following is a summary of “Auto-delineation of treatment target volume for radiation therapy using large language model-aided multimodal learning,” published in the August 2024 issue of Oncology by Rajendran et al.
Artificial intelligence (AI)-aided techniques have significantly advanced the automatic delineation of normal tissues, yet challenges persist in accurately contouring radiotherapy target volumes. This study aims to address this issue by framing target volume delineation as a clinical decision-making problem and resolving it through large language model-aided multimodal learning approaches.
Researchers developed a vision-language model named Medformer, which utilizes a hierarchical vision transformer as its backbone and integrates large language models to extract text-rich features. This model incorporates contextually embedded linguistic features into visual data through a visual language attention module, facilitating language-aware visual encoding. Performance metrics, including the Dice similarity coefficient (DSC), intersection over union (IOU), and 95th percentile Hausdorff distance (HD95), were employed to assess the model’s efficacy quantitatively. Evaluations were performed on an in-house dataset of prostate cancer and a public dataset of oropharyngeal carcinoma (OPC), comprising a total of 668 subjects.
Medformer demonstrated superior performance in delineating the gross tumor volume (GTV) compared to conventional methods. For the prostate cancer dataset, Medformer achieved a DSC of 0.81 ± 0.10, an IOU of 0.73 ± 0.12, and an HD95 of 9.86 ± 9.77 mm, significantly outperforming traditional methods with a DSC of 0.72 ± 0.10, IOU of 0.65 ± 0.09, and HD95 of 19.13 ± 12.96 mm. In the OPC dataset, Medformer achieved a DSC of 0.77 ± 0.11, an IOU of 0.70 ± 0.09, and an HD95 of 7.52 ± 4.8 mm, compared to traditional methods with a DSC of 0.72 ± 0.09, IOU of 0.65 ± 0.07, and HD95 of 13.63 ± 7.13 mm, with all improvements being statistically significant (p < 0.05). For clinical target volume (CTV) delineation, Medformer achieved a DSC of 0.91 ± 0.04, an IOU of 0.85 ± 0.05, and an HD95 of 2.98 ± 1.60 mm, demonstrating comparable performance to leading algorithms.
Integrating multimodal learning for target volume delineation significantly enhances auto-contouring accuracy compared to traditional methods that rely solely on visual features. Medformer’s advanced capabilities suggest its potential for integration into routine clinical practice, offering a more efficient and precise approach to contouring clinical and gross tumor volumes.
Source: sciencedirect.com/science/article/abs/pii/S0360301624029717