Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-12-19 , DOI: 10.1007/s40747-024-01672-0 Zhenhai Wang, Lutao Yuan, Ying Ren, Sen Zhang, Hongyu Tian
The most common method for visual object tracking involves feeding an image pair comprising a template image and search region into a tracker. The tracker uses a backbone to process the information in the image pair. In pure Transformer-based frameworks, redundant information in image pairs exists throughout the tracking process and the corresponding negative tokens consume the same computational resources as the positive tokens while degrading the performance of the tracker. Therefore, we propose to solve this problem using an adaptive dynamic sampling strategy in a pure Transformer-based tracker, known as ADSTrack. ADSTrack progressively reduces irrelevant, redundant negative tokens in the search region that are not related to the tracked objectand the effect of noise generated by these tokens. The adaptive dynamic sampling strategy enhances the performance of the tracker by scoring and adaptive sampling of important tokens, and the number of tokens sampled varies according to the input image. Moreover, the adaptive dynamic sampling strategy is a parameterless token sampling strategy that does not use additional parameters. We add several extra tokens as auxiliary tokens to the backbone to further optimize the feature map. We extensively evaluate ADSTrack, achieving satisfactory results for seven test sets, including UAV123 and LaSOT.
中文翻译:
ADSTrack:用于视觉跟踪的自适应动态采样
视觉对象跟踪的最常见方法是将包含模板图像和搜索区域的图像对馈送到跟踪器中。跟踪器使用主干来处理图像对中的信息。在基于 Transformer 的纯框架中,图像对中的冗余信息贯穿整个跟踪过程,相应的负 Tokens 消耗与正 Token相同的计算资源,同时降低 Tracker 的性能。因此,我们建议在基于 Transformer 的纯跟踪器(称为 ADSTrack)中使用自适应动态采样策略来解决这个问题。ADSTrack 逐渐减少搜索区域中与被跟踪对象无关的不相关、冗余的负标记,以及这些标记产生的噪声影响。自适应动态采样策略通过对重要 Token 进行评分和自适应采样来增强跟踪器的性能,采样的 Token 数量根据输入图像的不同而变化。此外,自适应动态采样策略是一种不使用额外参数的无参数 token 采样策略。我们在 backbone 中添加了几个额外的 token 作为辅助 token,以进一步优化 Feature Map。我们广泛评估了 ADSTrack,在 UAV123 和 LaSOT 等 7 个测试集上取得了令人满意的结果。