当前位置:
X-MOL 学术
›
Robot. Comput.-Integr. Manuf.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
A stable method for task priority adaptation in quadratic programming via reinforcement learning
Robotics and Computer-Integrated Manufacturing ( IF 9.1 ) Pub Date : 2024-08-30 , DOI: 10.1016/j.rcim.2024.102857 Andrea Testa , Marco Laghi , Edoardo Del Bianco , Gennaro Raiola , Enrico Mingo Hoffman , Arash Ajoudani
Robotics and Computer-Integrated Manufacturing ( IF 9.1 ) Pub Date : 2024-08-30 , DOI: 10.1016/j.rcim.2024.102857 Andrea Testa , Marco Laghi , Edoardo Del Bianco , Gennaro Raiola , Enrico Mingo Hoffman , Arash Ajoudani
In emerging manufacturing facilities, robots must enhance their flexibility. They are expected to perform complex jobs, showing different behaviors on the need, all within unstructured environments, and without requiring reprogramming or setup adjustments. To address this challenge, we introduce the A3CQP, a non-strict hierarchical Quadratic Programming (QP) controller. It seamlessly combines both motion and interaction functionalities, with priorities dynamically and autonomously adapted through a Reinforcement Learning-based adaptation module. This module utilizes the Asynchronous Advantage Actor–Critic algorithm (A3C) to ensure rapid convergence and stable training within continuous action and observation spaces. The experimental validation, involving a collaborative peg-in-hole assembly and the polishing of a wooden plate, demonstrates the effectiveness of the proposed solution in terms of its automatic adaptability, responsiveness, flexibility, and safety.
中文翻译:
通过强化学习实现二次规划任务优先级自适应的稳定方法
在新兴的制造设施中,机器人必须增强其灵活性。他们需要执行复杂的工作,根据需要表现出不同的行为,所有这些都在非结构化环境中进行,并且不需要重新编程或设置调整。为了应对这一挑战,我们引入了 A3CQP,一种非严格分层二次规划 (QP) 控制器。它无缝地结合了运动和交互功能,并通过基于强化学习的适应模块动态、自主地调整优先级。该模块利用异步优势Actor-Critic算法(A3C)来确保连续动作和观察空间内的快速收敛和稳定训练。实验验证涉及协作钉孔组装和木板抛光,证明了所提出的解决方案在自动适应性、响应性、灵活性和安全性方面的有效性。
更新日期:2024-08-30
中文翻译:
通过强化学习实现二次规划任务优先级自适应的稳定方法
在新兴的制造设施中,机器人必须增强其灵活性。他们需要执行复杂的工作,根据需要表现出不同的行为,所有这些都在非结构化环境中进行,并且不需要重新编程或设置调整。为了应对这一挑战,我们引入了 A3CQP,一种非严格分层二次规划 (QP) 控制器。它无缝地结合了运动和交互功能,并通过基于强化学习的适应模块动态、自主地调整优先级。该模块利用异步优势Actor-Critic算法(A3C)来确保连续动作和观察空间内的快速收敛和稳定训练。实验验证涉及协作钉孔组装和木板抛光,证明了所提出的解决方案在自动适应性、响应性、灵活性和安全性方面的有效性。