当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TopicFM+: Boosting Accuracy and Efficiency of Topic-Assisted Feature Matching
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2024-10-17 , DOI: 10.1109/tip.2024.3473301
Khang Truong Giang, Soohwan Song, Sungho Jo

This study tackles image matching in difficult scenarios, such as scenes with significant variations or limited texture, with a strong emphasis on computational efficiency. Previous studies have attempted to address this challenge by encoding global scene contexts using Transformers. However, these approaches have high computational costs and may not capture sufficient high-level contextual information, such as spatial structures or semantic shapes. To overcome these limitations, we propose a novel image-matching method that leverages a topic-modeling strategy to capture high-level contexts in images. Our method represents each image as a multinomial distribution over topics, where each topic represents semantic structures. By incorporating these topics, we can effectively capture comprehensive context information and obtain discriminative and high-quality features. Notably, our coarse-level matching network enhances efficiency by employing attention layers only to fixed-sized topics and small-sized features. Finally, we design a dynamic feature refinement network for precise results at a finer matching stage. Through extensive experiments, we have demonstrated the superiority of our method in challenging scenarios. Specifically, our method ranks in the top 9% in the Image Matching Challenge 2023 without using ensemble techniques. Additionally, we achieve an approximately 50% reduction in computational costs compared to other Transformer-based methods. Code is available at https://github.com/TruongKhang/TopicFM .

中文翻译:


TopicFM+:提高主题辅助特征匹配的准确性和效率



本研究解决了困难场景中的图像匹配问题,例如具有显着变化或纹理有限的场景,并非常强调计算效率。以前的研究试图通过使用 Transformer 对全局场景上下文进行编码来解决这一挑战。但是,这些方法的计算成本很高,并且可能无法捕获足够的高级上下文信息,例如空间结构或语义形状。为了克服这些限制,我们提出了一种新的图像匹配方法,该方法利用主题建模策略来捕获图像中的高级上下文。我们的方法将每张图像表示为主题的多项式分布,其中每个主题代表语义结构。通过整合这些主题,我们可以有效地捕获全面的上下文信息并获得判别性和高质量的特征。值得注意的是,我们的粗略匹配网络通过仅对固定大小的主题和小规模特征使用注意力层来提高效率。最后,我们设计了一个动态特征细化网络,以便在更精细的匹配阶段获得精确的结果。通过广泛的实验,我们证明了我们的方法在具有挑战性的场景中的优越性。具体来说,我们的方法在 2023 年图像匹配挑战赛中排名前 9%,没有使用集成技术。此外,与其他基于 Transformer 的方法相比,我们的计算成本降低了约 50%。代码可在 https://github.com/TruongKhang/TopicFM 获取。
更新日期:2024-10-17
down
wechat
bug