当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A distributed adaptive policy gradient method based on momentum for multi-agent reinforcement learning
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-07-12 , DOI: 10.1007/s40747-024-01529-6
Junru Shi , Xin Wang , Mingchuan Zhang , Muhua Liu , Junlong Zhu , Qingtao Wu

Policy Gradient (PG) method is one of the most popular algorithms in Reinforcement Learning (RL). However, distributed adaptive variants of PG are rarely studied in multi-agent. For this reason, this paper proposes a distributed adaptive policy gradient algorithm (IS-DAPGM) incorporated with Adam-type updates and importance sampling technique. Furthermore, we also establish the theoretical convergence rate of \(\mathcal {O}(1/\sqrt{T})\), where T represents the number of iterations, it can match the convergence rate of the state-of-the-art centralized policy gradient methods. In addition, many experiments are conducted in a multi-agent environment, which is a modification on the basis of Particle world environment. By comparing with some other distributed PG methods and changing the number of agents, we verify the performance of IS-DAPGM is more efficient than the existing methods.



中文翻译:


一种基于动量的多智能体强化学习分布式自适应策略梯度方法



策略梯度(PG)方法是强化学习(RL)中最流行的算法之一。然而,PG 的分布式自适应变体在多智能体中的研究却很少。为此,本文提出了一种结合 Adam 型更新和重要性采样技术的分布式自适应策略梯度算法(IS-DAPGM)。此外,我们还建立了\(\mathcal {O}(1/\sqrt{T})\)的理论收敛速度,其中T表示迭代次数,它可以匹配状态的收敛速度-艺术中心化政策梯度方法。另外,很多实验都是在多智能体环境下进行的,这是在粒子世界环境基础上的修改。通过与其他一些分布式PG方法进行比较并改变代理数量,我们验证了IS-DAPGM的性能比现有方法更有效。

更新日期:2024-07-12
down
wechat
bug