International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-12-07 , DOI: 10.1007/s11263-024-02311-4 Andong Lu, Chenglong Li, Jiacong Zhao, Jin Tang, Bin Luo
Current RGBT tracking research relies on the complete multi-modality input, but modal information might miss due to some factors such as thermal sensor self-calibration and data transmission error, called modality-missing challenge in this work. To address this challenge, we propose a novel invertible prompt learning approach, which integrates the content-preserving prompts into a well-trained tracking model to adapt to various modality-missing scenarios, for robust RGBT tracking. Given one modality-missing scenario, we propose to utilize the available modality to generate the prompt of the missing modality to adapt to RGBT tracking model. However, the cross-modality gap between available and missing modalities usually causes semantic distortion and information loss in prompt generation. To handle this issue, we design the invertible prompter by incorporating the full reconstruction of the input available modality from the generated prompt. To provide a comprehensive evaluation platform, we construct several high-quality benchmark datasets, in which various modality-missing scenarios are considered to simulate real-world challenges. Extensive experiments on three modality-missing benchmark datasets show that our method achieves significant performance improvements compared with state-of-the-art methods. We have released the code and simulation datasets at: https://github.com/mmic-lcl.
中文翻译:
模态缺失 RGBT 跟踪:可逆提示学习和高质量基准测试
目前的 RGBT 跟踪研究依赖于完整的多模态输入,但由于热传感器自校准和数据传输误差等因素,模态信息可能会丢失,在这项工作中称为模态缺失挑战。为了应对这一挑战,我们提出了一种新的可逆提示学习方法,该方法将内容保留提示集成到一个训练有素的跟踪模型中,以适应各种模态缺失场景,实现稳健的 RGBT 跟踪。给定一个模态缺失场景,我们建议利用可用的模态来生成缺失模态的提示,以适应 RGBT 跟踪模型。然而,可用模态和缺失模态之间的跨模态差距通常会导致提示生成中的语义失真和信息丢失。为了解决这个问题,我们通过整合生成的提示中输入可用模态的完全重建来设计可逆提示器。为了提供一个全面的评估平台,我们构建了几个高质量的基准数据集,其中考虑了各种模态缺失场景来模拟现实世界的挑战。在三个模态缺失基准数据集上的广泛实验表明,与最先进的方法相比,我们的方法实现了显著的性能改进。我们在以下位置发布了代码和模拟数据集:https://github.com/mmic-lcl。