Will Transformers change gastrointestinal endoscopic image analysis? A comparative analysis between CNNs and Transformers, in terms of performance, robustness and generalization,Medical Image Analysis

当前位置： X-MOL 学术 › Med. Image Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Will Transformers change gastrointestinal endoscopic image analysis? A comparative analysis between CNNs and Transformers, in terms of performance, robustness and generalization
Medical Image Analysis ( IF 10.7 ) Pub Date : 2024-09-16 , DOI: 10.1016/j.media.2024.103348
Carolus H.J. Kusters, Tim J.M. Jaspers, Tim G.W. Boers, Martijn R. Jong, Jelmer B. Jukema, Kiki N. Fockens, Albert J. de Groof, Jacques J. Bergman, Fons van der Sommen, Peter H.N. De With

Gastrointestinal endoscopic image analysis presents significant challenges, such as considerable variations in quality due to the challenging in-body imaging environment, the often-subtle nature of abnormalities with low interobserver agreement, and the need for real-time processing. These challenges pose strong requirements on the performance, generalization, robustness and complexity of deep learning-based techniques in such safety–critical applications. While Convolutional Neural Networks (CNNs) have been the go-to architecture for endoscopic image analysis, recent successes of the Transformer architecture in computer vision raise the possibility to update this conclusion. To this end, we evaluate and compare clinically relevant performance, generalization and robustness of state-of-the-art CNNs and Transformers for neoplasia detection in Barrett’s esophagus. We have trained and validated several top-performing CNNs and Transformers on a total of 10,208 images (2,079 patients), and tested on a total of 7,118 images (998 patients) across multiple test sets, including a high-quality test set, two internal and two external generalization test sets, and a robustness test set. Furthermore, to expand the scope of the study, we have conducted the performance and robustness comparisons for colonic polyp segmentation (Kvasir-SEG) and angiodysplasia detection (Giana). The results obtained for featured models across a wide range of training set sizes demonstrate that Transformers achieve comparable performance as CNNs on various applications, show comparable or slightly improved generalization capabilities and offer equally strong resilience and robustness against common image corruptions and perturbations. These findings confirm the viability of the Transformer architecture, particularly suited to the dynamic nature of endoscopic video analysis, characterized by fluctuating image quality, appearance and equipment configurations in transition from hospital to hospital. The code is made publicly available at: https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers.

中文翻译：

Transformers 会改变胃肠道内窥镜图像分析吗？CNN 和 Transformer 在性能、稳健性和泛化方面的比较分析

胃肠道内窥镜图像分析带来了重大挑战，例如由于具有挑战性的体内成像环境而导致的质量差异很大，观察者间一致性低的异常性质通常很微妙，以及需要实时处理。这些挑战对此类安全关键型应用中基于深度学习的技术的性能、泛化、稳健性和复杂性提出了很高的要求。虽然卷积神经网络（CNN）一直是内窥镜图像分析的首选架构，但 Transformer 架构最近在计算机视觉中的成功提高了更新这一结论的可能性。为此，我们评估和比较了最先进的 CNN 和 Transformers 在 Barrett 食管肿瘤检测中的临床相关性能、泛化和稳健性。我们已经在总共 10,208 张图像（2,079 名患者）上训练和验证了几种表现最好的 CNN 和 Transformer，并在多个测试集上对总共 7,118 张图像（998 名患者）进行了测试，包括一个高质量测试集、两个内部和两个外部泛化测试集和一个稳健性测试集。此外，为了扩大研究范围，我们对结肠息肉分割（Kvasir-SEG）和血管发育不良检测（Giana）进行了性能和稳健性比较。在各种训练集大小中为特色模型获得的结果表明，Transformer 在各种应用程序上实现了与 CNN 相当的性能，显示出相当或略有改进的泛化能力，并针对常见的图像损坏和扰动提供了同样强大的弹性和鲁棒性。这些发现证实了 Transformer 架构的可行性，特别适合内窥镜视频分析的动态性质，其特点是在医院之间过渡时图像质量、外观和设备配置会波动。该代码可在以下网址公开获得：https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers。

更新日期：2024-09-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南