当前位置: X-MOL 学术Am. J. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Assessing the utility of large language models for phenotype-driven gene prioritization in the diagnosis of rare genetic disease
American Journal of Human Genetics ( IF 8.1 ) Pub Date : 2024-09-09 , DOI: 10.1016/j.ajhg.2024.08.010
Junyoung Kim 1 , Kai Wang 2 , Chunhua Weng 1 , Cong Liu 1
Affiliation  

Phenotype-driven gene prioritization is fundamental to diagnosing rare genetic disorders. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models (LLMs) promise a streamlined text-to-gene solution. In this study, we evaluated five LLMs, including two generative pre-trained transformers (GPT) series and three Llama2 series, assessing their performance across task completeness, gene prediction accuracy, and adherence to required output structures. We conducted experiments, exploring various combinations of models, prompts, phenotypic input types, and task difficulty levels. Our findings revealed that the best-performed LLM, GPT-4, achieved an average accuracy of 17.0% in identifying diagnosed genes within the top 50 predictions, which still falls behind traditional tools. However, accuracy increased with the model size. Consistent results were observed over time, as shown in the dataset curated after 2023. Advanced techniques such as retrieval-augmented generation (RAG) and few-shot learning did not improve the accuracy. Sophisticated prompts were more likely to enhance task completeness, especially in smaller models. Conversely, complicated prompts tended to decrease output structure compliance rate. LLMs also achieved better-than-random prediction accuracy with free-text input, though performance was slightly lower than with standardized concept input. Bias analysis showed that highly cited genes, such as BRCA1, TP53, and PTEN, are more likely to be predicted. Our study provides valuable insights into integrating LLMs with genomic analysis, contributing to the ongoing discussion on their utilization in clinical workflows.

中文翻译:


评估大型语言模型在罕见遗传病诊断中表型驱动的基因优先级的效用



表型驱动的基因优先级是诊断罕见遗传病的基础。虽然传统方法依赖于具有表型-基因关系的精选知识图谱,但大型语言模型 (LLMs有望提供简化的文本到基因解决方案。在这项研究中,我们评估了五个 LLMs,包括两个生成式预训练转换器 (GPT) 系列和三个 Llama2 系列,评估了它们在任务完成度、基因预测准确性和对所需输出结构的依从性方面的表现。我们进行了实验,探索了模型、提示、表型输入类型和任务难度级别的各种组合。我们的研究结果表明,表现最好的 LLM GPT-4 在识别前 50 个预测中的诊断基因方面取得了 17.0% 的平均准确率,这仍然落后于传统工具。但是,准确性随着模型大小的增加而增加。随着时间的推移,观察到一致的结果,如 2023 年之后策划的数据集所示。检索增强生成 (RAG) 和小样本学习等先进技术并没有提高准确性。复杂的提示更有可能提高任务完成度,尤其是在较小的模型中。相反,复杂的提示往往会降低输出结构的合规性。LLMs 在自由文本输入中也实现了优于随机的预测精度,尽管性能略低于标准化概念输入。偏倚分析显示,BRCA1 、 TP53 和 PTEN 等高引用基因更有可能被预测。我们的研究为将 LLMs,有助于持续讨论它们在临床工作流程中的利用。
更新日期:2024-09-09
down
wechat
bug