A unified feature-motion consistency framework for robust image matching
ISPRS Journal of Photogrammetry and Remote Sensing ( IF 10.6 ) Pub Date : 2024-09-25 , DOI: 10.1016/j.isprsjprs.2024.09.021
Yan Zhou, Jinding Gao, Xiaoping Liu

Establishing reliable feature matches between a pair of images in various scenarios is a long-standing open problem in photogrammetry. Attention-based detector-free matching with coarse-to-fine architecture has been a typical pipeline to build matches, but the cross-attention module with global receptive field may compromise the structural local consistency by introducing irrelevant regions (outliers). Motion field can maintain structural local consistency under the assumption that matches for adjacent features should be spatially proximate. However, motion field can only estimate local displacements between consecutive images and struggle with long-range displacements estimation in large-scale variation scenarios without spatial correlation priors. Moreover, large-scale variations may also disrupt the geometric consistency with the application of mutual nearest neighbor criterion in patch-level matching, making it difficult to recover accurate matches. In this paper, we propose a unified feature-motion consistency framework for robust image matching (MOMA), to maintain structural consistency at both global and local granularity in scale-discrepancy scenarios. MOMA devises a motion consistency-guided dependency range strategy (MDR) in cross attention, aggregating highly relevant regions within the motion consensus-restricted neighborhood to favor true matchable regions. Meanwhile, a unified framework with hierarchical attention structure is established to couple local motion field with global feature correspondence. The motion field provides local consistency constraints in feature aggregation, while feature correspondence provides spatial context prior to improve motion field estimation. To alleviate geometric inconsistency caused by hard nearest neighbor criterion, we propose an adaptive neighbor search (soft) strategy to address scale discrepancy. Extensive experiments on three datasets demonstrate that our method outperforms solid baselines, with AUC improvements of 4.73/4.02/3.34 in two-view pose estimation task at thresholds of 5°/10°/20° on Megadepth test, and 5.94% increase of accuracy at threshold of 1px in homography task on HPatches datasets. Furthermore, in the downstream tasks such as 3D mapping, the 3D models reconstructed using our method on the self-collected SYSU UAV datasets exhibit significant improvement in structural completeness and detail richness, manifesting its high applicability in wide downstream tasks. The code is publicly available at https://github.com/BunnyanChou/MOMA.



在各种场景下建立一对图像之间可靠的特征匹配是摄影测量中长期存在的开放性问题。具有从粗到精架构的基于注意力的无检测器匹配一直是构建匹配的典型流程,但是具有全局感受野的交叉注意力模块可能会通过引入不相关区域(异常值)来损害结构局部一致性。在相邻特征的匹配应该在空间上接近的假设下,运动场可以保持结构局部一致性。然而,运动场只能估计连续图像之间的局部位移,并且在没有空间相关先验的情况下难以在大规模变化场景中进行远程位移估计。此外,大规模的变化还可能破坏在块级匹配中应用相互最近邻准则的几何一致性,从而难以恢复准确的匹配。在本文中,我们提出了一种用于鲁棒图像匹配(MOMA)的统一特征运动一致性框架,以在尺度差异场景中保持全局和局部粒度的结构一致性。 MOMA 在交叉注意力中设计了一种运动一致性引导的依赖范围策略(MDR),聚合运动共识限制邻域内的高度相关区域,以支持真正的可匹配区域。同时,建立了具有分层注意力结构的统一框架,将局部运动场与全局特征对应耦合起来。运动场在特征聚合中提供局部一致性约束,而特征对应在改进运动场估计之前提供空间上下文。 为了减轻由硬最近邻准则引起的几何不一致,我们提出了一种自适应邻域搜索(软)策略来解决尺度差异。对三个数据集的大量实验表明,我们的方法优于可靠的基线,在兆深度测试中,在 5°/10°/20° 阈值的双视图姿态估计任务中,AUC 提高了 4.73/4.02/3.34,准确度提高了 5.94% HPatches 数据集上的单应性任务中的阈值为 1px。此外,在3D建图等下游任务中,使用我们的方法在自收集的SYSU无人机数据集上重建的3D模型在结构完整性和细节丰富度上表现出显着的改善,体现了其在广泛的下游任务中的高适用性。该代码可在 https://github.com/BunnyanChou/MOMA 上公开获取。