A distributed adaptive policy gradient method based on momentum for multi-agent reinforcement learning,Complex & Intelligent Systems

当前位置： X-MOL 学术 › Complex Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A distributed adaptive policy gradient method based on momentum for multi-agent reinforcement learning
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-07-12 , DOI: 10.1007/s40747-024-01529-6
Junru Shi , Xin Wang , Mingchuan Zhang , Muhua Liu , Junlong Zhu , Qingtao Wu

Policy Gradient (PG) method is one of the most popular algorithms in Reinforcement Learning (RL). However, distributed adaptive variants of PG are rarely studied in multi-agent. For this reason, this paper proposes a distributed adaptive policy gradient algorithm (IS-DAPGM) incorporated with Adam-type updates and importance sampling technique. Furthermore, we also establish the theoretical convergence rate of \(\mathcal {O}(1/\sqrt{T})\), where T represents the number of iterations, it can match the convergence rate of the state-of-the-art centralized policy gradient methods. In addition, many experiments are conducted in a multi-agent environment, which is a modification on the basis of Particle world environment. By comparing with some other distributed PG methods and changing the number of agents, we verify the performance of IS-DAPGM is more efficient than the existing methods.

中文翻译：

一种基于动量的多智能体强化学习分布式自适应策略梯度方法

策略梯度（PG）方法是强化学习（RL）中最流行的算法之一。然而，PG 的分布式自适应变体在多智能体中的研究却很少。为此，本文提出了一种结合 Adam 型更新和重要性采样技术的分布式自适应策略梯度算法（IS-DAPGM）。此外，我们还建立了\(\mathcal {O}(1/\sqrt{T})\)的理论收敛速度，其中T表示迭代次数，它可以匹配状态的收敛速度-艺术中心化政策梯度方法。另外，很多实验都是在多智能体环境下进行的，这是在粒子世界环境基础上的修改。通过与其他一些分布式PG方法进行比较并改变代理数量，我们验证了IS-DAPGM的性能比现有方法更有效。

更新日期：2024-07-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>