Multi-task learning with cross-task consistency for improved depth estimation in colonoscopy,Medical Image Analysis

当前位置： X-MOL 学术 › Med. Image Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-task learning with cross-task consistency for improved depth estimation in colonoscopy
Medical Image Analysis ( IF 10.7 ) Pub Date : 2024-11-04 , DOI: 10.1016/j.media.2024.103379
Pedro Esteban Chavarrias Solano, Andrew Bulpitt, Venkataraman Subramanian, Sharib Ali

Colonoscopy screening is the gold standard procedure for assessing abnormalities in the colon and rectum, such as ulcers and cancerous polyps. Measuring the abnormal mucosal area and its 3D reconstruction can help quantify the surveyed area and objectively evaluate disease burden. However, due to the complex topology of these organs and variable physical conditions, for example, lighting, large homogeneous texture, and image modality estimating distance from the camera (aka depth) is highly challenging. Moreover, most colonoscopic video acquisition is monocular, making the depth estimation a non-trivial problem. While methods in computer vision for depth estimation have been proposed and advanced on natural scene datasets, the efficacy of these techniques has not been widely quantified on colonoscopy datasets. As the colonic mucosa has several low-texture regions that are not well pronounced, learning representations from an auxiliary task can improve salient feature extraction, allowing estimation of accurate camera depths. In this work, we propose to develop a novel multi-task learning (MTL) approach with a shared encoder and two decoders, namely a surface normal decoder and a depth estimator decoder. Our depth estimator incorporates attention mechanisms to enhance global context awareness. We leverage the surface normal prediction to improve geometric feature extraction. Also, we apply a cross-task consistency loss among the two geometrically related tasks, surface normal and camera depth. We demonstrate an improvement of 15.75% on relative error and 10.7% improvement on δ1.25 accuracy over the most accurate baseline state-of-the-art Big-to-Small (BTS) approach. All experiments are conducted on a recently released C3VD dataset, and thus, we provide a first benchmark of state-of-the-art methods on this dataset.

中文翻译：

具有跨任务一致性的多任务学习，用于改进结肠镜检查的深度估计

结肠镜检查筛查是评估结肠和直肠异常（如溃疡和癌性息肉）的金标准程序。测量异常粘膜面积及其 3D 重建有助于量化调查区域并客观评估疾病负担。然而，由于这些器官的复杂拓扑结构和可变的物理条件，例如，照明、大型均匀纹理和估计与相机的距离（又名深度）的图像模态极具挑战性。此外，大多数结肠镜视频采集都是单眼的，这使得深度估计成为一个不小的问题。虽然在自然场景数据集上已经提出并推进了计算机视觉中用于深度估计的方法，但这些技术的有效性尚未在结肠镜检查数据集上得到广泛量化。由于结肠粘膜有几个不太明显的低纹理区域，因此从辅助任务中学习表示可以改善显着特征的提取，从而允许估计准确的相机深度。在这项工作中，我们建议开发一种新颖的多任务学习（MTL）方法，其中包含一个共享编码器和两个解码器，即表面法线解码器和深度估计器解码器。我们的深度估计器结合了注意力机制来增强全局上下文意识。我们利用表面法线预测来改进几何特征提取。此外，我们还在两个几何相关的任务（表面法线和相机深度）之间应用了跨任务一致性损失。我们证明，与最准确的基线最先进的从大到小（BTS）方法相比，相对误差提高了 15.75%，δ1.25 精度提高了 10.7%。所有实验都是在最近发布的 C3VD 数据集上进行的，因此，我们在该数据集上提供了最先进方法的第一个基准。

更新日期：2024-11-04

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南