当前位置: X-MOL 学术IEEE Trans. Ind. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hand-Object Pose Estimation and Reconstruction Based on Signed Distance Field and Multiscale Feature Interaction
IEEE Transactions on Industrial Informatics ( IF 11.7 ) Pub Date : 2024-05-29 , DOI: 10.1109/tii.2024.3383542
Xinkang Zhang 1 , Xiaokun Dai 1 , Ziqun Zhang 2 , Xinhan Di 3 , Xinrong Chen 4
Affiliation  

The study of reconstruction of hands and objects from color monocular images has garnered considerable attention in recent years. In existing methods, parametric models are constructed at single scale, and the interaction between hands and objects has not fully be explored. As a result, the multiscale information in 2D images cannot be fully exploited. At the same time, the lack of feature fusion and insufficient utilization of labels also have a great impact on the reconstruction accuracy. To address the limitations, a new framework is proposed, which comprises three key modules. Firstly, a multiscale feature extractor, which generates a multiscale representation of feature, is used to capture the interaction between hand and object more effectively. Secondly, a bridge based on attention has been used to establish the connection between hand and object representations, which facilitates the integration of them. Lastly, a module based on token merge is introduced into the framework, which provides the segmentation representation of object. The experimental results on two datasets, named Obman and DexYCB, demonstrated that the proposed method had good performance and achieved a shape error about 0.121 $\text{cm}^{2}$ on Obman and 0.40 $\text{cm}^{2}$ on DexYCB, outperforming the state-of-the-art methods. This study will probably provide the human-computer interaction methods with broader application prospects.

中文翻译:


基于符号距离场和多尺度特征交互的手部物体姿态估计与重建



近年来,从彩色单目图像重建手和物体的研究引起了相当大的关注。现有方法中,参数模型是在单一尺度上构建的,并且手与物体之间的交互尚未得到充分探索。结果,二维图像中的多尺度信息无法被充分利用。同时,特征融合的缺乏和标签的利用不足也对重建精度产生很大影响。为了解决这些限制,提出了一个新框架,其中包括三个关键模块。首先,使用多尺度特征提取器生成特征的多尺度表示,以更有效地捕获手与物体之间的交互。其次,基于注意力的桥梁被用来建立手和物体表征之间的联系,这有利于它们的集成。最后,框架中引入了基于token merge的模块,提供对象的分割表示。在 Obman 和 DexYCB 两个数据集上的实验结果表明,该方法具有良好的性能,在 Obman 上实现了约 0.121 $\text{cm}^{2}$ 的形状误差,在 Obman 上实现了约 0.121 $\text{cm}^{2}$ 的形状误差,在 Obman 上实现了约 0.40 $\text{cm}^{ 的形状误差。 2}$ 在 DexYCB 上,优于最先进的方法。这项研究或许将为人机交互方法提供更广阔的应用前景。
更新日期:2024-05-29
down
wechat
bug