当前位置:
X-MOL 学术
›
Br. J. Ophthalmol.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Development and evaluation of a large language model of ophthalmology in Chinese
British Journal of Ophthalmology ( IF 3.7 ) Pub Date : 2024-10-01 , DOI: 10.1136/bjo-2023-324526 Ce Zheng 1, 2 , Hongfei Ye 1, 2 , Jinming Guo 3 , Junrui Yang 4 , Ping Fei 1 , Yuanzhi Yuan 5 , Danqing Huang 6 , Yuqiang Huang 3 , Jie Peng 7 , Xiaoling Xie 3 , Meng Xie 1 , Peiquan Zhao 1 , Li Chen 8 , Mingzhi Zhang 9
British Journal of Ophthalmology ( IF 3.7 ) Pub Date : 2024-10-01 , DOI: 10.1136/bjo-2023-324526 Ce Zheng 1, 2 , Hongfei Ye 1, 2 , Jinming Guo 3 , Junrui Yang 4 , Ping Fei 1 , Yuanzhi Yuan 5 , Danqing Huang 6 , Yuqiang Huang 3 , Jie Peng 7 , Xiaoling Xie 3 , Meng Xie 1 , Peiquan Zhao 1 , Li Chen 8 , Mingzhi Zhang 9
Affiliation
Background Large language models (LLMs), such as ChatGPT, have considerable implications for various medical applications. However, ChatGPT’s training primarily draws from English-centric internet data and is not tailored explicitly to the medical domain. Thus, an ophthalmic LLM in Chinese is clinically essential for both healthcare providers and patients in mainland China. Methods We developed an LLM of ophthalmology (MOPH) using Chinese corpora and evaluated its performance in three clinical scenarios: ophthalmic board exams in Chinese, answering evidence-based medicine-oriented ophthalmic questions and diagnostic accuracy for clinical vignettes. Additionally, we compared MOPH’s performance to that of human doctors. Results In the ophthalmic exam, MOPH’s average score closely aligned with the mean score of trainees (64.7 (range 62–68) vs 66.2 (range 50–92), p=0.817), but achieving a score above 60 in all seven mock exams. In answering ophthalmic questions, MOPH demonstrated an adherence of 83.3% (25/30) of responses following Chinese guidelines (Likert scale 4–5). Only 6.7% (2/30, Likert scale 1–2) and 10% (3/30, Likert scale 3) of responses were rated as ‘poor or very poor’ or ‘potentially misinterpretable inaccuracies’ by reviewers. In diagnostic accuracy, although the rate of correct diagnosis by ophthalmologists was superior to that by MOPH (96.1% vs 81.1%, p>0.05), the difference was not statistically significant. Conclusion This study demonstrated the promising performance of MOPH, a Chinese-specific ophthalmic LLM, in diverse clinical scenarios. MOPH has potential real-world applications in Chinese-language ophthalmology settings. Data are available upon reasonable request. Data are available on reasonable request.
更新日期:2024-09-20