当前位置: X-MOL 学术Lancet Oncol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI): an international, paired, non-inferiority, confirmatory study
The Lancet Oncology ( IF 41.6 ) Pub Date : 2024-06-11 , DOI: 10.1016/s1470-2045(24)00220-1
Anindo Saha 1 , Joeran S Bosma 2 , Jasper J Twilt 3 , Bram van Ginneken 2 , Anders Bjartell 4 , Anwar R Padhani 5 , David Bonekamp 6 , Geert Villeirs 7 , Georg Salomon 8 , Gianluca Giannarini 9 , Jayashree Kalpathy-Cramer 10 , Jelle Barentsz 11 , Klaus H Maier-Hein 12 , Mirabela Rusu 13 , Olivier Rouvière 14 , Roderick van den Bergh 15 , Valeria Panebianco 16 , Veeru Kasivisvanathan 17 , Nancy A Obuchowski 18 , Derya Yakar 19 , Mattijs Elschot 20 , Jeroen Veltman 21 , Jurgen J Fütterer 3 , Maarten de Rooij 22 , Henkjan Huisman 23 ,
Affiliation  

Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging—Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5–10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4–6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at , . Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87–0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83–0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6–63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3–92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3–72·4] 69·0% [65·5–72·5]) at the same sensitivity (96·1%, 94·0–98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (−0·04) was greater than the non-inferiority margin (−0·05) and a p value below the significance threshold was reached (p<0·001). An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. Health~Holland and EU Horizon 2020.

中文翻译:


人工智能和放射科医生在 MRI 前列腺癌检测 (PI-CAI) 中的应用:一项国际配对、非劣效性验证性研究



人工智能 (AI) 系统可以通过减轻日益增加的工作量、防止过度诊断和减少对经验丰富的放射科医生的依赖来帮助前列腺癌的诊断。我们的目的是研究人工智能系统在 MRI 检测临床上显着的前列腺癌方面的性能,并与使用前列腺成像报告和数据系统 2.1 版 (PI-RADS 2.1) 的放射科医生以及大规模多学科常规实践中的护理标准进行比较。在这项国际配对、非劣效性验证性研究中,我们训练并外部验证了一个人工智能系统(在国际联盟内开发),用于使用来自 9129 名患者的 10207 例 MRI 检查的回顾性队列来检测格里森 2 级或更大的癌症。在这些检查中,来自荷兰三个中心(11个地点)的9207个案例用于训练和调整,来自荷兰和挪威四个中心(12个地点)的1000个案例用于测试。与此同时,我们利用 PI-RADS (2.1) 对 62 名放射科医生(20 个国家的 45 个中心;阅读前列腺 MRI 的经验中位数为 7 [IQR 5–10] 年)进行了一项多读者、多病例观察者研究,对 400 项配对 MRI 检查进行了研究。测试队列。主要终点是 AI 系统的灵敏度、特异性和受试者工作特征曲线下面积 (AUROC),与使用 PI-RADS (2.1) 的所有读取器相比,以及与在多学科常规实践(即借助患者病史和同行咨询的护理标准)。 使用组织病理学和至少 3 年(中位 5 [IQR 4-6] 年)的随访来建立参考标准。统计分析计划预先设定了非劣效性的主要假设(考虑 0·05 的裕度)和在非劣效性得到确认的情况下对人工智能系统的优越性的次要假设。这项研究注册于,。在2012年1月1日至2021年12月31日期间进行的10207例检查中,2440例经组织学证实患有格里森2级或更大的前列腺癌。在将 AI 系统与参与读者研究的放射科医生进行比较的 400 个测试案例子集中,AI 系统显示出统计上优越且非劣势的 AUROC 为 0·91(95% CI 0·87–0·94) ; p<0·0001),与 AUROC 为 0·86 (0·83–0·89) 的 62 名放射科医生相比,两侧 95% Wald CI 的差异下限AUROC 为 0·02。在所有读者的平均 PI-RADS 3 或更高操作点上,AI 系统在相同特异性下检测到的格里森 2 级或以上癌症病例增加了 6·8%(57·7%,95% CI 51·6– 63·3),或在相同敏感性下,假阳性结果减少 50·4%,格里森 1 级癌症病例减少 20·0%(89·4%,95% CI 85·3–92·9)。在将 AI 系统与多学科实践中的放射学读数进行比较的所有 1000 个测试案例中,未确认非劣效性,因为 AI 系统表现出较低的特异性(68·9% [95% CI 65·3–72·4] ] 69·0% [65·5–72·5]),灵敏度(96·1%、94·0–98·2)与 PI-RADS 3 或更高操作点相同。 特异性差异的两侧 95% Wald CI 的下限 (−0·04) 大于非劣效性界限 (−0·05),并且达到低于显着性阈值的 p 值 (p<0 ·001)。平均而言,在检测具有临床意义的前列腺癌方面,AI 系统优于使用 PI-RADS (2.1) 的放射科医生,并且与护理标准相当。这样的系统显示出成为初级诊断环境中的支持工具的潜力,为患者和放射科医生带来了一些相关的好处。需要前瞻性验证来测试该系统的临床适用性。健康~荷兰和欧盟地平线 2020。
更新日期:2024-06-11
down
wechat
bug