Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2024-11-01 , DOI: 10.1038/s42256-024-00917-4 Samuel Schmidgall, Ji Woong Kim, Alan Kuntz, Ahmed Ezzat Ghazi, Axel Krieger
The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a target position. However, recent work on high-capacity models in robotics has shown promise towards being trained on large collections of diverse and task-agnostic datasets of video demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: there is a lack of existing large-scale open-source data to train models; it is challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue; and surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This Perspective aims to provide a path towards increasing robot autonomy in robot-assisted surgery through the development of a multi-modal, multi-task, vision–language–action model for surgical robots. Ultimately, we argue that surgical robots are uniquely positioned to benefit from general-purpose models and provide four guiding actions towards increased autonomy in robot-assisted surgery.
中文翻译:
通用基础模型,提高机器人辅助手术的自主性
端到端机器人学习的主要范式侧重于优化特定于任务的目标,以解决单个机器人问题,例如拾取物体或到达目标位置。然而,最近关于机器人技术中高容量模型的研究已经显示出在大量多样化且与任务无关的视频演示数据集上进行训练的前景。这些模型对看不见的情况显示出令人印象深刻的泛化水平,尤其是在数据量和模型复杂性增加的情况下。从数据中学习的手术机器人系统很难像机器人学习的其他领域那样快速发展,原因如下:缺乏现有的大规模开源数据来训练模型;对这些机器人在手术过程中处理的软体变形进行建模具有挑战性,因为模拟无法匹配生物组织的物理和视觉复杂性;手术机器人在临床试验中测试时可能会伤害患者,需要更广泛的安全措施。该观点旨在通过为手术机器人开发多模式、多任务、视觉-语言-动作模型,为提高机器人辅助手术中的机器人自主性提供一条途径。最终,我们认为手术机器人具有独特的优势,可以从通用模型中受益,并为提高机器人辅助手术的自主性提供四项指导行动。