To assess the accuracy of ophthalmic information provided by an artificial intelligence chatbot (ChatGPT).
Five diseases from 8 subspecialties of Ophthalmology were assessed by ChatGPT version 3.5. Three questions were asked to ChatGPT for each disease: what is x?; how is x diagnosed?; how is x treated? (x = name of the disease). Responses were graded by comparing them to the American Academy of Ophthalmology (AAO) guidelines for patients, with scores ranging from -3 (unvalidated and potentially harmful to a patient’s health or well-being if they pursue such a suggestion) to 2 (correct and complete).
Accuracy of responses from ChatGPT in response to prompts related to ophthalmic health information in the form of scores on a scale from -3 to 2.
Of the 120 questions, 93 (77.5%) scored ≥ 1. 27. (22.5%) scored ≤ -1; among these, 9 (7.5%) obtained a score of -3. The overall median score amongst all subspecialties was 2 for the question “What is x”, 1.5 for “How is x diagnosed”, and 1 for “How is x treated”, though this did not achieve significance by Kruskal-Wallis testing.
Despite the positive scores, ChatGPT on its own still provides incomplete, incorrect, and potentially harmful information about common ophthalmic conditions, defined as the recommendation of invasive procedures or other interventions with potential for adverse sequelae which are not supported by the AAO for the disease in question. ChatGPT may be a valuable adjunct to patient education, but currently, it is not sufficient without concomitant human medical supervision.
© 2024. The Author(s).