A unified feature-motion consistency framework for robust image matching,ISPRS Journal of Photogrammetry and Remote Sensing

当前位置： X-MOL 学术 › ISPRS J. Photogramm. Remote Sens. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A unified feature-motion consistency framework for robust image matching
ISPRS Journal of Photogrammetry and Remote Sensing ( IF 10.6 ) Pub Date : 2024-09-25 , DOI: 10.1016/j.isprsjprs.2024.09.021
Yan Zhou , Jinding Gao , Xiaoping Liu

Establishing reliable feature matches between a pair of images in various scenarios is a long-standing open problem in photogrammetry. Attention-based detector-free matching with coarse-to-fine architecture has been a typical pipeline to build matches, but the cross-attention module with global receptive field may compromise the structural local consistency by introducing irrelevant regions (outliers). Motion field can maintain structural local consistency under the assumption that matches for adjacent features should be spatially proximate. However, motion field can only estimate local displacements between consecutive images and struggle with long-range displacements estimation in large-scale variation scenarios without spatial correlation priors. Moreover, large-scale variations may also disrupt the geometric consistency with the application of mutual nearest neighbor criterion in patch-level matching, making it difficult to recover accurate matches. In this paper, we propose a unified feature-motion consistency framework for robust image matching (MOMA), to maintain structural consistency at both global and local granularity in scale-discrepancy scenarios. MOMA devises a motion consistency-guided dependency range strategy (MDR) in cross attention, aggregating highly relevant regions within the motion consensus-restricted neighborhood to favor true matchable regions. Meanwhile, a unified framework with hierarchical attention structure is established to couple local motion field with global feature correspondence. The motion field provides local consistency constraints in feature aggregation, while feature correspondence provides spatial context prior to improve motion field estimation. To alleviate geometric inconsistency caused by hard nearest neighbor criterion, we propose an adaptive neighbor search (soft) strategy to address scale discrepancy. Extensive experiments on three datasets demonstrate that our method outperforms solid baselines, with AUC improvements of 4.73/4.02/3.34 in two-view pose estimation task at thresholds of 5°/10°/20° on Megadepth test, and 5.94% increase of accuracy at threshold of 1px in homography task on HPatches datasets. Furthermore, in the downstream tasks such as 3D mapping, the 3D models reconstructed using our method on the self-collected SYSU UAV datasets exhibit significant improvement in structural completeness and detail richness, manifesting its high applicability in wide downstream tasks. The code is publicly available at https://github.com/BunnyanChou/MOMA.

中文翻译：

统一的特征-运动一致性框架，实现强大的图像匹配

在各种情况下，在一对图像之间建立可靠的特征匹配是摄影测量中一个长期存在的悬而未决的问题。基于注意力的无检测器匹配与粗到细架构一直是构建匹配的典型管道，但具有全局感受野的交叉注意力模块可能会通过引入不相关的区域（异常值）来损害结构局部一致性。运动场可以保持结构局部一致性，前提是相邻要素的匹配在空间上应该是近似的。然而，运动场只能估计连续图像之间的局部位移，在没有空间相关性先验的大规模变化场景中难以进行远程位移估计。此外，大规模变化也可能破坏在补丁级匹配中应用互近邻准则的几何一致性，从而难以恢复准确的匹配。在本文中，我们提出了一个统一的特征-运动一致性框架，用于稳健图像匹配（MOMA），以在尺度差异场景中保持全局和局部粒度的结构一致性。MOMA 在交叉注意力中设计了一种运动一致性指导的依赖性范围策略（MDR），在运动共识限制邻域内聚合高度相关的区域，以支持真正的可匹配区域。同时，建立了具有分层注意力结构的统一框架，将局部运动场与全局特征对应耦合。运动场在特征聚合中提供局部一致性约束，而特征对应在改进运动场估计之前提供空间上下文。为了减轻硬最近邻准则引起的几何不一致，我们提出了一种自适应邻域搜索（软）策略来解决比例差异。在三个数据集上的广泛实验表明，我们的方法优于固体基线，在超大深度测试中，在 5°/10°/20° 阈值下的双视图姿态估计任务中，AUC 提高了 4.73/4.02/3.34，在 HPatches 数据集的同仁成像任务中，在 1px 阈值处的精度提高了 5.94%。此外，在三维测绘等下游任务中，使用我们的方法在自采集的中山大学无人机数据集上重建的三维模型在结构完整性和细节丰富度方面表现出显着的提高，显示出其在广泛的下游任务中的高度适用性。该代码在 https://github.com/BunnyanChou/MOMA 上公开提供。

更新日期：2024-09-25

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南