P0384 Applying ChatGPT to Improve Inflammatory Bowel Disease Diagnosis and Evaluation
O. Ozturk, O. Coşkun, C. Yavuz, İ. Tenlik, I. Yuksel
Abstract
Chat Generative Pre-Trained Transformer-v4 (ChatGPT-v4) is an unquestionable asset to healthcare professionals. This paper will outline a number of ways in which ChatGPT-v4 can be used to improve the accuracy of its diagnosis and treatment strategies for ulcerative colitis (UC) and Crohn’s disease (CD), in line with the ECCO guidelines. This 102-item questionnaire was designed to assess the accuracy, consistenacy, and inclusiveness of responses to questions about the diagnosis and treatment of UC and CD. The questionnaire was created in the form of true/false and multiple-choice questions and was based on clinical scenarios reflecting real-life situations and the ECCO guidelines. After that, ChatGPT-v4 was shown it, and its reactions were assessed. The responses were evaluated using a Likert scale. The queries were posed to the artificial intelligence at 15-day intervals. In 47 responses (92%), ChatGPT adhered to the established guidelines, with deviations observed in 4 instances (8%). Of the 4 responses previously identified as non-compliant, 3 demonstrated no change or improvement following a 15-day reassessment period. The mean values for the accuracy scores were found to differ, with 5.45 for the first set and 5.56 for the second (p = 0.606). The completeness scores showed a similar trend, with the mean value differing by 0.10, with 2.33 for the first set and 2.43 for the second (p = 0.280). The standard deviation trends for the two sets were also comparable. The analysis revealed a significant enhancement in accuracy from the initial to the fifteenth day of assessment (5.49 and 5.56, respectively). A comparable increase was observed in completeness scores, rising from 2.15 to 2.37 between the initial and fifteenth day assessments. The imaging results indicated that more accurate responses were given to questions pertaining to it, although the difference was not statistically significant (p = 0.31). In the context of multiple-choice questions, the performance of the AI model demonstrated enhanced stability and consistency (p = 0.606). The majority of answers were initially correct (90% of the total). Four out of five incorrect answers evolved into correct answers, one incorrect answer persisted, and two correct answers changed to incorrect answers. ChatGPT-v4 has demonstrated potential for development as a clinical support tool in the management of inflammatory bowel diseases, including UC and CD. However, performance differed between binary and multiple-choice questions.