当前位置: X-MOL 学术npj Digit. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Phenotyping people with a history of injecting drug use within electronic medical records using an interactive machine learning approach
npj Digital Medicine ( IF 12.4 ) Pub Date : 2024-11-30 , DOI: 10.1038/s41746-024-01318-y
Carol El-Hayek, Thi Nguyen, Margaret E. Hellard, Michael Curtis, Rachel Sacks-Davis, Htein Linn Aung, Jason Asselin, Douglas I. R. Boyle, Anna Wilkinson, Victoria Polkinghorne, Jane S. Hocking, Adam G. Dunn

People with a history of injecting drug use are a priority for eliminating blood-borne viruses and sexually transmissible infections. Identifying them for disease surveillance in electronic medical records (EMRs) is challenged by sparsity of predictors. This study introduced a novel approach to phenotype people who have injected drugs using structured EMR data and interactive human-in-the-loop methods. We iteratively trained random forest classifiers removing important features and adding new positive labels each time. The initial model achieved 92.7% precision and 93.5% recall. Models maintained >90% precision and recall after nine iterations, revealing combinations of less obvious features influencing predictions. Applied to approximately 1.7 million patients, the final model identified 128,704 (7.7%) patients as potentially having injected drugs, beyond the 50,510 (2.9%) with known indicators of injecting drug use. This process produced explainable models that revealed otherwise hidden combinations of predictors, offering an adaptive approach to addressing the inherent challenge of inconsistently missing data in EMRs.



中文翻译:


使用交互式机器学习方法在电子病历中对有注射吸毒史的人进行表型



有注射吸毒史的人是消除血源性病毒和性传播感染的优先事项。在电子病历 (EMR) 中识别它们以进行疾病监测受到预测因子稀疏性的挑战。本研究引入了一种使用结构化 EMR 数据和交互式人在环方法对注射药物的人进行表型分析的新方法。我们迭代训练随机森林分类器,每次都删除重要特征并添加新的正标签。初始模型实现了 92.7% 的准确率和 93.5% 的召回率。模型在 9 次迭代后保持了 >90% 的精确率和召回率,揭示了影响预测的不太明显的特征的组合。最终模型应用于大约 170 万名患者,确定了 128,704 名 (7.7%) 患者可能注射了毒品,超过了 50,510 名 (2.9%) 具有已知注射吸毒指标的患者。这个过程产生了可解释的模型,这些模型揭示了原本隐藏的预测因子组合,提供了一种自适应方法来解决 EMR 中不一致缺失数据的固有挑战。

更新日期:2024-11-30
down
wechat
bug