European Journal of Nuclear Medicine and Molecular Imaging ( IF 8.6 ) Pub Date : 2024-11-27 , DOI: 10.1007/s00259-024-06988-0 Aleksej Kucerenko, Thomas Buddenkotte, Ivayla Apostolova, Susanne Klutmann, Christian Ledig, Ralph Buchert
Purpose
Deep convolutional neural networks (CNN) hold promise for assisting the interpretation of dopamine transporter (DAT)-SPECT. For improved communication of uncertainty to the user it is crucial to reliably discriminate certain from inconclusive cases that might be misclassified by strict application of a predefined decision threshold on the CNN output. This study tested two methods to incorporate existing label uncertainty during the training to improve the utility of the CNN sigmoid output for this task.
Methods
Three datasets were used retrospectively: a “development” dataset (n = 1740) for CNN training, validation and testing, two independent out-of-distribution datasets (n = 640, 645) for testing only. In the development dataset, binary classification based on visual inspection was performed carefully by three well-trained readers. A ResNet-18 architecture was trained for binary classification of DAT-SPECT using either a randomly selected vote (“random vote training”, RVT), the proportion of “reduced” votes ( “average vote training”, AVT) or the majority vote (MVT) across the three readers as reference standard. Balanced accuracy was computed separately for “inconclusive” sigmoid outputs (within a predefined interval around the 0.5 decision threshold) and for “certain” (non-inconclusive) sigmoid outputs.
Results
The proportion of “inconclusive” test cases that had to be accepted to achieve a given balanced accuracy in the “certain” test case was lower with RVT and AVT than with MVT in all datasets (e.g., 1.9% and 1.2% versus 2.8% for 98% balanced accuracy in “certain” test cases from the development dataset). In addition, RVT and AVT resulted in slightly higher balanced accuracy in all test cases independent of their certainty (97.3% and 97.5% versus 97.0% in the development dataset).
Conclusion
Making between-readers-discrepancy known to CNN during the training improves the utility of their sigmoid output to discriminate certain from inconclusive cases that might be misclassified by the CNN when the predefined decision threshold is strictly applied. This does not compromise on overall accuracy.
中文翻译:
在卷积神经网络的训练过程中加入标签不确定性可以提高多巴胺转运蛋白 SPECT 中某些和不确定病例的区分性能
目的
深度卷积神经网络 (CNN) 有望协助解释多巴胺转运蛋白 (DAT)-SPECT。为了更好地向用户传达不确定性,可靠地区分某些和不确定的情况至关重要,这些情况可能会因在 CNN 输出上严格应用预定义的决策阈值而被错误分类。本研究测试了两种方法,以在训练过程中整合现有的标签不确定性,以提高 CNN sigmoid 输出在这项任务中的效用。
方法
回顾性使用了三个数据集:一个用于 CNN 训练、验证和测试的“开发”数据集 (n = 1740),两个仅用于测试的独立分布外数据集 (n = 640, 645)。在开发数据集中,基于目视检查的二元分类由三名训练有素的读者仔细执行。使用随机选择的投票(“随机投票训练”,RVT)、“减少”投票的比例(“平均投票训练”,AVT)或多数投票 (MVT) 作为参考标准,对 DAT-SPECT 进行 ResNet-18 架构训练。分别计算“不确定”sigmoid 输出(在 0.5 决策阈值附近的预定义区间内)和“确定”(非不确定)sigmoid 输出的平衡精度。
结果
在所有数据集中,RVT 和 AVT 必须接受才能在“某些”测试用例中实现给定平衡精度的“不确定”测试用例的比例低于 MVT(例如,在开发数据集的“某些”测试用例中,98% 的平衡精度为 1.9% 和 1.2% 对 2.8%)。此外,RVT 和 AVT 在所有测试用例中导致平衡准确性略高,与它们的质量无关(97.3% 和 97.5% 对开发数据集中的 97.0%)。
结论
在训练期间让 CNN 知道读者之间的差异可以提高其 sigmoid 输出的效用,以区分某些与不确定的案例,这些案例在严格应用预定义的决策阈值时可能会被 CNN 错误分类。这不会影响整体准确性。