Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-07-19 , DOI: 10.1007/s40747-024-01558-1 Haoyuan Zhang
Most existing 3D action recognition works rely on the supervised learning paradigm, yet the limited availability of annotated data limits the full potential of encoding networks. As a result, effective self-supervised pre-training strategies have been actively researched. In this paper, we target to explore a self-supervised learning approach for 3D action recognition, and propose the Attention-guided Mask Learning (AML) scheme. Specifically, the dropping mechanism is introduced into contrastive learning to develop Attention-guided Mask (AM) module as well as mask learning strategy, respectively. The AM module leverages the spatial and temporal attention to guide the corresponding features masking, so as to produce the masked contrastive object. The mask learning strategy enables the model to discriminate different actions even with important features masked, which makes action representation learning more discriminative. What’s more, to alleviate the strict positive constraint that would hinder representation learning, the positive-enhanced learning strategy is leveraged in the second-stage training. Extensive experiments on NTU-60, NTU-120, and PKU-MMD datasets show that the proposed AML scheme improves the performance in self-supervised 3D action recognition, achieving state-of-the-art results.
中文翻译:
用于自监督 3D 动作识别的注意力引导掩模学习
大多数现有的 3D 动作识别工作都依赖于监督学习范式,但注释数据的有限可用性限制了编码网络的全部潜力。因此,有效的自监督预训练策略被积极研究。在本文中,我们的目标是探索一种用于 3D 动作识别的自监督学习方法,并提出了注意力引导掩模学习(AML)方案。具体来说,将丢弃机制引入对比学习中,分别开发注意力引导掩模(AM)模块和掩模学习策略。 AM模块利用空间和时间注意力来引导相应的特征掩蔽,从而产生掩蔽的对比对象。掩码学习策略使模型即使在重要特征被掩码的情况下也能够区分不同的动作,这使得动作表示学习更具辨别力。此外,为了缓解阻碍表征学习的严格正向约束,在第二阶段训练中采用了正向增强学习策略。在 NTU-60、NTU-120 和 PKU-MMD 数据集上进行的大量实验表明,所提出的 AML 方案提高了自监督 3D 动作识别的性能,取得了最先进的结果。