TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers,Medical Image Analysis

当前位置： X-MOL 学术 › Med. Image Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers
Medical Image Analysis ( IF 10.7 ) Pub Date : 2024-07-22 , DOI: 10.1016/j.media.2024.103280
Jieneng Chen ₁ , Jieru Mei ₁ , Xianhang Li ₂ , Yongyi Lu ₁ , Qihang Yu ₁ , Qingyue Wei ₃ , Xiangde Luo ₄ , Yutong Xie ₅ , Ehsan Adeli ₆ , Yan Wang ₇ , Matthew P Lungren ₆ , Shaoting Zhang ₄ , Lei Xing ₃ , Le Lu ₈ , Alan Yuille ₁ , Yuyin Zhou ₂

Affiliation

Medical image segmentation is crucial for healthcare, yet convolution-based methods like U-Net face limitations in modeling long-range dependencies. To address this, Transformers designed for sequence-to-sequence predictions have been integrated into medical image segmentation. However, a comprehensive understanding of Transformers’ self-attention in U-Net components is lacking. TransUNet, first introduced in 2021, is widely recognized as one of the first models to integrate Transformer into medical image analysis. In this study, we present the versatile framework of TransUNet that encapsulates Transformers’ self-attention into two key modules: (1) a Transformer encoder tokenizing image patches from a convolution neural network (CNN) feature map, facilitating global context extraction, and (2) a Transformer decoder refining candidate regions through cross-attention between proposals and U-Net features. These modules can be flexibly inserted into the U-Net backbone, resulting in three configurations: Encoder-only, Decoder-only, and Encoder+Decoder. TransUNet provides a library encompassing both 2D and 3D implementations, enabling users to easily tailor the chosen architecture. Our findings highlight the encoder’s efficacy in modeling interactions among multiple abdominal organs and the decoder’s strength in handling small targets like tumors. It excels in diverse medical applications, such as multi-organ segmentation, pancreatic tumor segmentation, and hepatic vessel segmentation. Notably, our TransUNet achieves a significant average Dice improvement of 1.06% and 4.30% for multi-organ segmentation and pancreatic tumor segmentation, respectively, when compared to the highly competitive nn-UNet, and surpasses the top-1 solution in the BrasTS2021 challenge. 2D/3D Code and models are available at https://github.com/Beckschen/TransUNet and https://github.com/Beckschen/TransUNet-3D, respectively.

中文翻译：

TransUNet：通过 Transformer 的视角重新思考用于医学图像分割的 U-Net 架构设计

医学图像分割对于医疗保健至关重要，但基于卷积的方法（例如 U-Net）在建模长期依赖性方面面临局限性。为了解决这个问题，专为序列到序列预测而设计的 Transformer 已集成到医学图像分割中。然而，缺乏对 U-Net 组件中 Transformers 自注意力的全面理解。 TransUNet 于 2021 年首次推出，被广泛认为是将 Transformer 集成到医学图像分析中的首批模型之一。在这项研究中，我们提出了 TransUNet 的多功能框架，它将 Transformer 的自注意力封装到两个关键模块中：（1）Transformer 编码器从卷积神经网络（CNN）特征图中标记图像块，促进全局上下文提取，以及（ 2) Transformer 解码器通过提案和 U-Net 特征之间的交叉注意力来细化候选区域。这些模块可以灵活地插入到 U-Net 主干中，从而产生三种配置：仅编码器、仅解码器和编码器+解码器。 TransUNet 提供了一个包含 2D 和 3D 实现的库，使用户能够轻松定制所选的架构。我们的研究结果强调了编码器在模拟多个腹部器官之间的相互作用方面的功效以及解码器在处理肿瘤等小目标方面的优势。它在多种医疗应用中表现出色，例如多器官分割、胰腺肿瘤分割和肝血管分割。值得注意的是，与竞争激烈的 nn-UNet 相比，我们的 TransUNet 在多器官分割和胰腺肿瘤分割方面的平均 Dice 显着提高，分别为 1.06% 和 4.30%，并超越了 BrasTS2021 挑战赛中的 top-1 解决方案。 2D/3D 代码和模型可分别从 https://github.com/Beckschen/TransUNet 和 https://github.com/Beckschen/TransUNet-3D 获取。

更新日期：2024-07-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南