Data-efficient multimodal human action recognition for proactive human–robot collaborative assembly: A cross-domain few-shot learning approach,Robotics and Computer-Integrated Manufacturing

当前位置： X-MOL 学术 › Robot. Comput.-Integr. Manuf. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Data-efficient multimodal human action recognition for proactive human–robot collaborative assembly: A cross-domain few-shot learning approach
Robotics and Computer-Integrated Manufacturing ( IF 9.1 ) Pub Date : 2024-05-15 , DOI: 10.1016/j.rcim.2024.102785
Tianyu Wang , Zhihao Liu , Lihui Wang , Mian Li , Xi Vincent Wang

With the recent vision of Industry 5.0, the cognitive capability of robots plays a crucial role in advancing proactive human–robot collaborative assembly. As a basis of the mutual empathy, the understanding of a human operator’s intention has been primarily studied through the technique of human action recognition. Existing deep learning-based methods demonstrate remarkable efficacy in handling information-rich data such as physiological measurements and videos, where the latter category represents a more natural perception input. However, deploying these methods in new unseen assembly scenarios requires first collecting abundant case-specific data. This leads to significant manual effort and poor flexibility. To deal with the issue, this paper proposes a novel cross-domain few-shot learning method for data-efficient multimodal human action recognition. A hierarchical data fusion mechanism is designed to jointly leverage the skeletons, RGB images and depth maps with complementary information. Then a temporal CrossTransformer is developed to enable the action recognition with very limited amount of data. Lightweight domain adapters are integrated to further improve the generalization with fast finetuning. Extensive experiments on a real car engine assembly case show the superior performance of proposed method over state-of-the-art regarding both accuracy and finetuning efficiency. Real-time demonstrations and ablation study further indicate the potential of early recognition, which is beneficial for the robot procedures generation in practical applications. In summary, this paper contributes to the rarely explored realm of data-efficient human action recognition for proactive human–robot collaboration.

中文翻译：

用于主动人机协作组装的数据高效的多模式人类动作识别：一种跨领域的小样本学习方法

随着工业 5.0 的最新愿景，机器人的认知能力在推进主动式人机协作装配方面发挥着至关重要的作用。作为相互同情的基础，对人类操作者意图的理解主要是通过人类动作识别技术来研究的。现有的基于深度学习的方法在处理生理测量和视频等信息丰富的数据方面表现出显着的功效，其中后一类代表了更自然的感知输入。然而，在新的看不见的装配场景中部署这些方法需要首先收集大量特定于案例的数据。这导致大量的手动工作和较差的灵活性。为了解决这个问题，本文提出了一种新的跨域少样本学习方法，用于数据高效的多模式人类动作识别。分层数据融合机制旨在联合利用骨架、RGB 图像和深度图以及互补信息。然后开发了一个时间 CrossTransformer 来实现用非常有限的数据量进行动作识别。集成了轻量级域适配器，通过快速微调进一步提高泛化能力。对真实汽车发动机装配案例的大量实验表明，所提出的方法在精度和微调效率方面均优于最先进的方法。实时演示和消融研究进一步表明了早期识别的潜力，这有利于实际应用中的机器人程序生成。总之，本文为主动人机协作的数据高效人类动作识别这一很少被探索的领域做出了贡献。

更新日期：2024-05-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>