当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Image depth estimation assisted by multi-view projection
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-12-05 , DOI: 10.1007/s40747-024-01688-6
Liman Liu, Jinshan Tian, Guansheng Luo, Siyuan Xu, Chen Zhang, Huaifei Hu, Wenbing Tao

In recent years, deep learning has significantly advanced the development of image depth estimation algorithms. The depth estimation network with single-view input can only extract features from a single 2D image, often neglecting the information contained in neighboring views, resulting in learned features that lack real geometrical information in the 3D world and stricter constraints on the 3D structure, leading to limitations in the performance of image depth estimation. In the absence of accurate camera information, the multi-view geometric cues obtained by some methods may not accurately reflect the real 3D structure, resulting in a lack of multi-view geometric constraints in image depth estimation algorithms. To address this problem, a multi-view projection-assisted image depth estimation network is proposed, which integrates multi-view stereo vision into a deep learning-based encoding-decoding image depth estimation framework without pre-estimation of view bitmap. The network estimates optical flow for pixel-level matching across views, thereby projecting the features of neighboring views to the reference viewpoints for self-attentive feature aggregation, compensating for the lack of stereo geometry information in the image depth estimation framework. Additionally, a multi-view reprojection error is designed for supervised optical flow estimation to effectively constrain the optical flow estimation process. In addition, a long-distance attention decoding module is proposed to achieve effective extraction and aggregation of features in distant areas of the scene, which enhances the perception capability for outdoor long-distance. Experimental results on the KITTI dataset, vKITTI dataset, and SeasonDepth dataset demonstrate that our method achieves significant improvements compared to other state-of-the-art depth estimation techniques. This confirms its superior performance in image depth estimation.



中文翻译:


多视图投影辅助的图像深度估计



近年来,深度学习显著推动了图像深度估计算法的发展。具有单视图输入的深度估计网络只能从单个 2D 图像中提取特征,往往会忽略相邻视图中包含的信息,导致学习到的特征在 3D 世界中缺乏真实的几何信息,对 3D 结构的约束更严格,导致图像深度估计的性能受到限制。在缺乏准确的相机信息的情况下,通过某些方法获得的多视角几何线索可能无法准确反映真实的三维结构,导致图像深度估计算法缺乏多视角几何约束。针对这一问题,该文提出一种多视角投影辅助图像深度估计网络,该网络将多视角立体视觉集成到基于深度学习的编解码图像深度估计框架中,无需对视图位图进行预估计。该网络估计光流以实现跨视图的像素级匹配,从而将相邻视图的特征投影到参考视点以进行自我关注的特征聚合,从而弥补图像深度估计框架中立体几何信息的缺失。此外,该文还设计了多视角重投影误差用于监督光流估计,以有效约束光流估计过程。此外,该文还提出了一种远距离注意力解码模块,以实现对场景远距离区域特征的有效提取和聚合,增强了对室外远距离的感知能力。 在 KITTI 数据集、vKITTI 数据集和 SeasonDepth 数据集上的实验结果表明,与其他最先进的深度估计技术相比,我们的方法取得了显着的改进。这证实了它在图像深度估计方面的卓越性能。

更新日期:2024-12-05
down
wechat
bug