Accuracy and readability of kidney stone patient information materials generated by a large language model compared to official urologic organizations.

Feb 26, 2024

Experts: Abdulghafour Halawani, Alec Mitchell, Mohammadali Saffarzadeh, Victor Wong, Ben H Chew, Connor M Forbes

To compare the readability and accuracy of large language model generated patient information materials (PIMs) to those supplied by the American Urological Association (AUA), Canadian Urological Association (CUA), and European Association of Urology (EAU) for kidney stones.
PIMs from AUA, CUA, and EAU related to nephrolithiasis were obtained and categorized. The most frequent patient questions related to kidney stones were identified from an internet query and input into GPT-3.5 and GPT-4. PIMs and ChatGPT outputs were assessed for accuracy and readability using previously published indexes. We also assessed changes in ChatGPT outputs when a reading level was specified (grade 6).
Readability scores were better for PIMs from the CUA (Grade level 10 – 12), AUA (8 – 10), or EAU (9 -11) compared to the chatbot. GPT-3.5 had the worst readability scores at Grade 13-14 and GPT-4 was likewise less readable than urologic organization PIMs with scores of 11-13. While organizational PIMs were deemed to be accurate, the chatbot had high accuracy with minor details omitted. GPT-4 was more accurate in general stone information, dietary and medical management of kidney stones topics in comparison to GPT-3.5, while both models had the same accuracy in the surgical management of nephrolithiasis topics.
Current PIMs from major urologic organizations for kidney stones remain more readable than publicly available GPT outputs, but they are still higher than the reading ability of the general population. Of the available PIMs for kidney stones, those from the AUA are the most readable. Although Chatbot outputs for common kidney stone patient queries have a high degree of accuracy with minor omitted details, it is important for clinicians to understand their strengths and limitations.

Copyright © 2024 Elsevier Inc. All rights reserved.

Author

admin

View all posts

ABOUT THE EXPERTS

Abdulghafour Halawani, Alec Mitchell, Mohammadali Saffarzadeh, Victor Wong, Ben H Chew, Connor M Forbes

Abdulghafour Halawani

Department of Urology, King Abdulaziz University, Jeddah, Saudi Arabia; Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada.

Alec Mitchell

Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada.

Mohammadali Saffarzadeh

Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada.

Victor Wong

Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada.

Ben H Chew

Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada.

Connor M Forbes

Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada; Vancouver Prostate Centre, Vancouver, British Columbia, Canada. Electronic address: connor.forbes@vch.ca.