当前位置: X-MOL 学术Automatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Federated reinforcement learning for robot motion planning with zero-shot generalization
Automatica ( IF 4.8 ) Pub Date : 2024-05-25 , DOI: 10.1016/j.automatica.2024.111709
Zhenyuan Yuan , Siyuan Xu , Minghui Zhu

This paper considers the problem of learning a control policy for robot motion planning with zero-shot generalization, i.e., no data collection and policy adaptation is needed when the learned policy is deployed in new environments. We develop a federated reinforcement learning framework that enables collaborative learning of multiple learners and a central server, i.e., the Cloud, without sharing their raw data. In each iteration, each learner uploads its local control policy and the corresponding estimated normalized arrival time to the Cloud, which then computes the global optimum among the learners and broadcasts the optimal policy to the learners. Each learner then selects between its local control policy and that from the Cloud for next iteration. The proposed framework leverages on the derived zero-shot generalization guarantees on arrival time and safety. Theoretical guarantees on almost-sure convergence, almost consensus, Pareto improvement and optimality gap are also provided. Monte Carlo simulation is conducted to evaluate the proposed framework.

中文翻译:


具有零样本泛化的机器人运动规划联合强化学习



本文考虑了学习零样本泛化机器人运动规划控制策略的问题,即当学习到的策略部署在新环境中时不需要数据收集和策略适应。我们开发了一个联合强化学习框架,可以实现多个学习者和中央服务器(即云)的协作学习,而无需共享他们的原始数据。在每次迭代中,每个学习器将其本地控制策略和相应的估计归一化到达时间上传到云端,然后云计算学习器之间的全局最优值并将最优策略广播给学习器。然后,每个学习者在其本地控制策略和来自云的控制策略之间进行选择以进行下一次迭代。所提出的框架利用派生的零样本泛化保证到达时间和安全性。还提供了几乎确定收敛、几乎一致、帕累托改进和最优性差距的理论保证。进行蒙特卡洛模拟来评估所提出的框架。
更新日期:2024-05-25
down
wechat
bug