Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-07-12 , DOI: 10.1007/s40747-024-01529-6 Junru Shi , Xin Wang , Mingchuan Zhang , Muhua Liu , Junlong Zhu , Qingtao Wu
Policy Gradient (PG) method is one of the most popular algorithms in Reinforcement Learning (RL). However, distributed adaptive variants of PG are rarely studied in multi-agent. For this reason, this paper proposes a distributed adaptive policy gradient algorithm (IS-DAPGM) incorporated with Adam-type updates and importance sampling technique. Furthermore, we also establish the theoretical convergence rate of \(\mathcal {O}(1/\sqrt{T})\), where T represents the number of iterations, it can match the convergence rate of the state-of-the-art centralized policy gradient methods. In addition, many experiments are conducted in a multi-agent environment, which is a modification on the basis of Particle world environment. By comparing with some other distributed PG methods and changing the number of agents, we verify the performance of IS-DAPGM is more efficient than the existing methods.
中文翻译:
一种基于动量的多智能体强化学习分布式自适应策略梯度方法
策略梯度(PG)方法是强化学习(RL)中最流行的算法之一。然而,PG 的分布式自适应变体在多智能体中的研究却很少。为此,本文提出了一种结合 Adam 型更新和重要性采样技术的分布式自适应策略梯度算法(IS-DAPGM)。此外,我们还建立了\(\mathcal {O}(1/\sqrt{T})\)的理论收敛速度,其中T表示迭代次数,它可以匹配状态的收敛速度-艺术中心化政策梯度方法。另外,很多实验都是在多智能体环境下进行的,这是在粒子世界环境基础上的修改。通过与其他一些分布式PG方法进行比较并改变代理数量,我们验证了IS-DAPGM的性能比现有方法更有效。