当前位置:
X-MOL 学术
›
Sci. Robot.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
NeuralFeels with neural fields: Visuotactile perception for in-hand manipulation
Science Robotics ( IF 26.1 ) Pub Date : 2024-11-13 , DOI: 10.1126/scirobotics.adl0628 Sudharshan Suresh, Haozhi Qi, Tingfan Wu, Taosha Fan, Luis Pineda, Mike Lambeta, Jitendra Malik, Mrinal Kalakrishnan, Roberto Calandra, Michael Kaess, Joseph Ortiz, Mustafa Mukadam
Science Robotics ( IF 26.1 ) Pub Date : 2024-11-13 , DOI: 10.1126/scirobotics.adl0628 Sudharshan Suresh, Haozhi Qi, Tingfan Wu, Taosha Fan, Luis Pineda, Mike Lambeta, Jitendra Malik, Mrinal Kalakrishnan, Roberto Calandra, Michael Kaess, Joseph Ortiz, Mustafa Mukadam
To achieve human-level dexterity, robots must infer spatial awareness from multimodal sensing to reason over contact interactions. During in-hand manipulation of novel objects, such spatial awareness involves estimating the object’s pose and shape. The status quo for in-hand perception primarily uses vision and is restricted to tracking a priori known objects. Moreover, visual occlusion of objects in hand is imminent during manipulation, preventing current systems from pushing beyond tasks without occlusion. We combined vision and touch sensing on a multifingered hand to estimate an object’s pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. We studied multimodal in-hand perception in simulation and the real world, interacting with different objects via a proprioception-driven policy. Our experiments showed final reconstruction F scores of 81% and average pose drifts of 4.7 millimeters, which was further reduced to 2.3 millimeters with known object models. In addition, we observed that, under heavy visual occlusion, we could achieve improvements in tracking up to 94% compared with vision-only methods. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation. We release our evaluation dataset of 70 experiments, FeelSight, as a step toward benchmarking in this domain. Our neural representation driven by multimodal sensing can serve as a perception backbone toward advancing robot dexterity.
中文翻译:
神经场的 NeuralFeel:用于手部操作的视觉触觉感知
为了实现人类水平的灵巧性,机器人必须从多模态传感中推断出空间感知,以推理接触交互。在用手操作新物体时,这种空间感知涉及估计物体的姿势和形状。手感的现状主要使用视觉,仅限于追踪先验的已知物体。此外,在操作过程中,手中物体的视觉遮挡迫在眉睫,从而阻止当前系统在没有遮挡的情况下超越任务。我们将视觉和触觉结合在多指手上,以估计在手部操作过程中物体的姿势和形状。我们的方法 NeuralFeels 通过在线学习神经场来编码对象几何,并通过优化姿势图问题来共同跟踪它。我们研究了模拟和现实世界中的多模态手部感知,通过本体感觉驱动的策略与不同的对象进行交互。我们的实验显示,最终重建 F 分数为 81%,平均姿势漂移为 4.7 毫米,使用已知物体模型进一步降低到 2.3 毫米。此外,我们观察到,在严重的视觉遮挡下,与纯视觉方法相比,我们可以实现高达 94% 的跟踪改进。我们的结果表明,触摸至少可以细化,并且在最好的情况下,可以消除手部操作过程中的视觉估计的歧义。我们发布了包含 70 个实验的评估数据集 FeelSight,作为在该领域进行基准测试的一步。我们由多模态传感驱动的神经表征可以作为提高机器人灵巧性的感知支柱。
更新日期:2024-11-13
中文翻译:
神经场的 NeuralFeel:用于手部操作的视觉触觉感知
为了实现人类水平的灵巧性,机器人必须从多模态传感中推断出空间感知,以推理接触交互。在用手操作新物体时,这种空间感知涉及估计物体的姿势和形状。手感的现状主要使用视觉,仅限于追踪先验的已知物体。此外,在操作过程中,手中物体的视觉遮挡迫在眉睫,从而阻止当前系统在没有遮挡的情况下超越任务。我们将视觉和触觉结合在多指手上,以估计在手部操作过程中物体的姿势和形状。我们的方法 NeuralFeels 通过在线学习神经场来编码对象几何,并通过优化姿势图问题来共同跟踪它。我们研究了模拟和现实世界中的多模态手部感知,通过本体感觉驱动的策略与不同的对象进行交互。我们的实验显示,最终重建 F 分数为 81%,平均姿势漂移为 4.7 毫米,使用已知物体模型进一步降低到 2.3 毫米。此外,我们观察到,在严重的视觉遮挡下,与纯视觉方法相比,我们可以实现高达 94% 的跟踪改进。我们的结果表明,触摸至少可以细化,并且在最好的情况下,可以消除手部操作过程中的视觉估计的歧义。我们发布了包含 70 个实验的评估数据集 FeelSight,作为在该领域进行基准测试的一步。我们由多模态传感驱动的神经表征可以作为提高机器人灵巧性的感知支柱。