Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-12-19 , DOI: 10.1007/s40747-024-01671-1 Liang Xu, Mingxiao Chen, Yi Cheng, Pengwu Song, Pengfei Shao, Shuwei Shen, Peng Yao, Ronald X. Xu
The UNet architecture, based on convolutional neural networks (CNN), has demonstrated its remarkable performance in medical image analysis. However, it faces challenges in capturing long-range dependencies due to the limited receptive fields and inherent bias of convolutional operations. Recently, numerous transformer-based techniques have been incorporated into the UNet architecture to overcome this limitation by effectively capturing global feature correlations. However, the integration of the Transformer modules may result in the loss of local contextual information during the global feature fusion process. In this work, we propose a 2D medical image segmentation model called multi-scale cross perceptron attention network (MCPA). The MCPA consists of three main components: an encoder, a decoder, and a Cross Perceptron. The Cross Perceptron first captures the local correlations using multiple Multi-scale Cross Perceptron modules, facilitating the fusion of features across scales. The resulting multi-scale feature vectors are then spatially unfolded, concatenated, and fed through a Global Perceptron module to model global dependencies. Considering the high computational cost of using 3D neural network models, and the fact that many important clinical data can only be obtained in two dimensions, our MCPA focuses on 2D medical image segmentation. Furthermore, we introduce a progressive dual-branch structure (PDBS) to address the semantic segmentation of the image involving finer tissue structures. This structure gradually shifts the segmentation focus of MCPA network training from large-scale structural features to more sophisticated pixel-level features. We evaluate our proposed MCPA model on several publicly available medical image datasets from different tasks and devices, including the open large-scale dataset of CT (Synapse), MRI (ACDC), and widely used 2D medical imaging datasets captured by fundus camera (DRIVE, CHASE\(\_\)DB1, HRF), and OCTA (ROSE). The experimental results show that our MCPA model achieves state-of-the-art performance.
中文翻译:
MCPA:用于 2D 医学图像分割的多尺度交叉感知器注意力网络
基于卷积神经网络 (CNN) 的 UNet 架构在医学图像分析中证明了其卓越的性能。然而,由于有限的感受野和卷积运算的固有偏差,它在捕获长程依赖性方面面临挑战。最近,许多基于 transformer 的技术已被整合到 UNet 架构中,以通过有效捕获全局特征相关性来克服这一限制。但是,Transformer 模块的集成可能会导致在全局特征融合过程中丢失本地上下文信息。在这项工作中,我们提出了一种称为多尺度交叉感知器注意力网络 (MCPA) 的 2D 医学图像分割模型。MCPA 由三个主要组件组成:编码器、解码器和交叉感知器。Cross Perceptron 首先使用多个 Multi-scale Cross Perceptron 模块捕获局部相关性,从而促进跨尺度特征的融合。然后,生成的多尺度特征向量在空间上展开、连接并通过 Global Perceptron 模块馈送,以对全局依赖关系进行建模。考虑到使用 3D 神经网络模型的高计算成本,以及许多重要的临床数据只能在二维环境中获得的事实,我们的 MCPA 专注于 2D 医学图像分割。此外,我们引入了一种渐进式双分支结构 (PDBS) 来解决涉及更精细组织结构的图像的语义分割。这种结构逐渐将 MCPA 网络训练的分割重点从大规模结构特征转移到更复杂的像素级特征。 我们在来自不同任务和设备的几个公开可用的医学图像数据集上评估了我们提出的 MCPA 模型,包括 CT (Synapse)、MRI (ACDC) 的开放大规模数据集,以及由眼底相机 (DRIVE, CHASE\(\_\)DB1, HRF) 和 OCTA (ROSE) 捕获的广泛使用的 2D 医学成像数据集。实验结果表明,我们的 MCPA 模型实现了最先进的性能。