当前位置: X-MOL 学术Clin. Chem. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine Learning-Based Prediction of Hemoglobinopathies Using Complete Blood Count Data
Clinical Chemistry ( IF 7.1 ) Pub Date : 2024-06-22 , DOI: 10.1093/clinchem/hvae081
Anoeska Schipper 1, 2 , Matthieu Rutten 2, 3 , Adriaan van Gammeren 4 , Cornelis L Harteveld 5 , Eloísa Urrechaga 6 , Floor Weerkamp 7 , Gijs den Besten 8 , Johannes Krabbe 9 , Jennichjen Slomp 9 , Lise Schoonen 7, 10 , Maarten Broeren 11 , Merel van Wijnen 12 , Mirelle J A J Huijskens 13 , Tamara Koopmann 5 , Bram van Ginneken 2 , Ron Kusters 1, 14 , Steef Kurstjens 1
Affiliation  

Background Hemoglobinopathies, the most common inherited blood disorder, are frequently underdiagnosed. Early identification of carriers is important for genetic counseling of couples at risk. The aim of this study was to develop and validate a novel machine learning model on a multicenter data set, covering a wide spectrum of hemoglobinopathies based on routine complete blood count (CBC) testing. Methods Hemoglobinopathy test results from 10 322 adults were extracted retrospectively from 8 Dutch laboratories. eXtreme Gradient Boosting (XGB) and logistic regression models were developed to differentiate negative from positive hemoglobinopathy cases, using 7 routine CBC parameters. External validation was conducted on a data set from an independent Dutch laboratory, with an additional external validation on a Spanish data set (n = 2629) specifically for differentiating thalassemia from iron deficiency anemia (IDA). Results The XGB and logistic regression models achieved an area under the receiver operating characteristic (AUROC) of 0.88 and 0.84, respectively, in distinguishing negative from positive hemoglobinopathy cases in the independent external validation set. Subclass analysis showed that the XGB model reached an AUROC of 0.97 for β-thalassemia, 0.98 for α0-thalassemia, 0.95 for homozygous α+-thalassemia, 0.78 for heterozygous α+-thalassemia, and 0.94 for the structural hemoglobin variants Hemoglobin C, Hemoglobin D, Hemoglobin E. Both models attained AUROCs of 0.95 in differentiating IDA from thalassemia. Conclusions Both the XGB and logistic regression model demonstrate high accuracy in predicting a broad range of hemoglobinopathies and are effective in differentiating hemoglobinopathies from IDA. Integration of these models into the laboratory information system facilitates automated hemoglobinopathy detection using routine CBC parameters.

中文翻译:


使用全血细胞计数数据对血红蛋白病进行基于机器学习的预测



背景 血红蛋白病是最常见的遗传性血液病,经常被低估。早期识别携带者对于高危夫妇的遗传咨询很重要。本研究的目的是在多中心数据集上开发和验证一种新的机器学习模型,该模型基于常规全血细胞计数 (CBC) 测试涵盖了广泛的血红蛋白病。方法 回顾性提取 8 个荷兰实验室 10 322 例成人血红蛋白病的检测结果。开发了 eXtreme Gradient Boosting (XGB) 和 logistic 回归模型,以使用 7 个常规 CBC 参数区分阴性和阳性血红蛋白病病例。对来自荷兰独立实验室的数据集进行了外部验证,并在西班牙数据集 (n = 2629) 上进行了额外的外部验证,专门用于区分地中海贫血与缺铁性贫血 (IDA)。结果 XGB 和 logistic 回归模型在独立外部验证集中区分阴性和阳性血红蛋白病病例时,受试者工作特征下面积 (AUROC) 分别为 0.88 和 0.84。亚类分析显示,XGB 模型的 β-地中海贫血的 AUROC 为 0.97,α0-地中海贫血为 0.98,纯合子 α+-地中海贫血为 0.95,杂合子 α+-地中海贫血为 0.78,结构血红蛋白变体血红蛋白 C、血红蛋白 D、血红蛋白 E 为 0.94。两种模型在区分 IDA 和地中海贫血方面的 AUROC 均为 0.95。结论 XGB 和 logistic 回归模型在预测广泛的血红蛋白病方面均表现出很高的准确性,并且可以有效地区分血红蛋白病和 IDA。 将这些模型集成到实验室信息系统中有助于使用常规 CBC 参数进行自动血红蛋白病检测。
更新日期:2024-06-22
down
wechat
bug