Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification,Complex & Intelligent Systems

当前位置： X-MOL 学术 › Complex Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-11-08 , DOI: 10.1007/s40747-024-01646-2
Yongkang Ding, Jiechen Li, Hao Wang, Ziang Liu, Anqi Wang

Clothes-Changing Person Re-Identification is a challenging problem in computer vision, primarily due to the appearance variations caused by clothing changes across different camera views. This poses significant challenges to traditional person re-identification techniques that rely on clothing features. These challenges include the inconsistency of clothing and the difficulty in learning reliable clothing-irrelevant local features. To address this issue, we propose a novel network architecture called the Attention-Enhanced Multimodal Feature Fusion Network (AE-Net). AE-Net effectively mitigates the impact of clothing changes on recognition accuracy by integrating RGB global features, grayscale image features, and clothing-irrelevant features obtained through semantic segmentation. Specifically, global features capture the overall appearance of the person; grayscale image features help eliminate the interference of color in recognition; and clothing-irrelevant features derived from semantic segmentation enforce the model to learn features independent of the person’s clothing. Additionally, we introduce a multi-scale fusion attention mechanism that further enhances the model’s ability to capture both detailed and global structures, thereby improving recognition accuracy and robustness. Extensive experimental results demonstrate that AE-Net outperforms several state-of-the-art methods on the PRCC and LTCC datasets, particularly in scenarios with significant clothing changes. On the PRCC and LTCC datasets, AE-Net achieves Top-1 accuracy rates of 60.4% and 42.9%, respectively.

中文翻译：

一种注意力增强的多模态特征融合网络，用于换衣人员的重新识别

换衣服的人重新识别是计算机视觉中一个具有挑战性的问题，主要是由于不同相机视图中的衣服变化会导致外观变化。这对依赖服装特征的传统人物重新识别技术构成了重大挑战。这些挑战包括服装的不一致以及难以学习可靠的服装无关的当地特征。为了解决这个问题，我们提出了一种新的网络架构，称为注意力增强多模态特征融合网络（AE-Net）。AE-Net 通过整合 RGB 全局特征、灰度图像特征和通过语义分割获得的服装无关特征，有效减轻服装变化对识别精度的影响。具体来说，全局特征捕获人员的整体外观;灰度图像特征有助于消除颜色在识别中的干扰;以及从语义分割中得出的与服装无关的特征，则强制模型学习独立于人服装的特征。此外，我们引入了一种多尺度融合注意力机制，进一步增强了模型捕获细节和全局结构的能力，从而提高了识别的准确性和鲁棒性。广泛的实验结果表明，AE-Net 在 PRCC 和 LTCC 数据集上优于几种最先进的方法，尤其是在服装发生重大变化的情况下。在 PRCC 和 LTCC 数据集上，AE-Net 分别达到 60.4% 和 42.9% 的 Top-1 准确率。

更新日期：2024-11-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南