当前位置: X-MOL 学术Med. Image Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Will Transformers change gastrointestinal endoscopic image analysis? A comparative analysis between CNNs and Transformers, in terms of performance, robustness and generalization
Medical Image Analysis ( IF 10.7 ) Pub Date : 2024-09-16 , DOI: 10.1016/j.media.2024.103348
Carolus H.J. Kusters, Tim J.M. Jaspers, Tim G.W. Boers, Martijn R. Jong, Jelmer B. Jukema, Kiki N. Fockens, Albert J. de Groof, Jacques J. Bergman, Fons van der Sommen, Peter H.N. De With

Gastrointestinal endoscopic image analysis presents significant challenges, such as considerable variations in quality due to the challenging in-body imaging environment, the often-subtle nature of abnormalities with low interobserver agreement, and the need for real-time processing. These challenges pose strong requirements on the performance, generalization, robustness and complexity of deep learning-based techniques in such safety–critical applications. While Convolutional Neural Networks (CNNs) have been the go-to architecture for endoscopic image analysis, recent successes of the Transformer architecture in computer vision raise the possibility to update this conclusion. To this end, we evaluate and compare clinically relevant performance, generalization and robustness of state-of-the-art CNNs and Transformers for neoplasia detection in Barrett’s esophagus. We have trained and validated several top-performing CNNs and Transformers on a total of 10,208 images (2,079 patients), and tested on a total of 7,118 images (998 patients) across multiple test sets, including a high-quality test set, two internal and two external generalization test sets, and a robustness test set. Furthermore, to expand the scope of the study, we have conducted the performance and robustness comparisons for colonic polyp segmentation (Kvasir-SEG) and angiodysplasia detection (Giana). The results obtained for featured models across a wide range of training set sizes demonstrate that Transformers achieve comparable performance as CNNs on various applications, show comparable or slightly improved generalization capabilities and offer equally strong resilience and robustness against common image corruptions and perturbations. These findings confirm the viability of the Transformer architecture, particularly suited to the dynamic nature of endoscopic video analysis, characterized by fluctuating image quality, appearance and equipment configurations in transition from hospital to hospital. The code is made publicly available at: https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers.

中文翻译:


变形金刚会改变胃肠内窥镜图像分析吗? CNN 和 Transformer 在性能、鲁棒性和泛化方面的比较分析



胃肠内窥镜图像分析面临着巨大的挑战,例如由于具有挑战性的体内成像环境而造成的质量相当大的变化、观察者间一致性较低的异常的微妙性质以及实时处理的需要。这些挑战对此类安全关键型应用中基于深度学习的技术的性能、泛化性、鲁棒性和复杂性提出了强烈要求。虽然卷积神经网络 (CNN) 一直是内窥镜图像分析的首选架构,但最近 Transformer 架构在计算机视觉领域取得的成功提出了更新这一结论的可能性。为此,我们评估和比较了最先进的 CNN 和 Transformer 在巴雷特食管肿瘤检测中的临床相关性能、泛化性和鲁棒性。我们在总共 10,208 张图像(2,079 名患者)上训练和验证了几个性能最佳的 CNN 和 Transformer,并在多个测试集上对总共 7,118 张图像(998 名患者)进行了测试,包括一个高质量的测试集、两个内部测试集以及两个外部泛化测试集和一个稳健性测试集。此外,为了扩大研究范围,我们对结肠息肉分割(Kvasir-SEG)和血管发育不良检测(Giana)进行了性能和稳健性比较。针对各种训练集大小的特征模型获得的结果表明,Transformers 在各种应用中实现了与 CNN 相当的性能,表现出可比或略有改进的泛化能力,并针对常见图像损坏和扰动提供了同样强大的弹性和鲁棒性。 这些发现证实了 Transformer 架构的可行性,特别适合内窥镜视频分析的动态特性,其特点是图像质量、外观和设备配置在医院之间的过渡过程中不断波动。该代码公开于:https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers。
更新日期:2024-09-16
down
wechat
bug