当前位置: X-MOL 学术Front Phys. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Self organizing optimization and phase transition in reinforcement learning minority game system
Frontiers of Physics ( IF 6.5 ) Pub Date : 2024-01-24 , DOI: 10.1007/s11467-023-1378-z
Si-Ping Zhang , Jia-Qi Dong , Hui-Yu Zhang , Yi-Xuan Lü , Jue Wang , Zi-Gang Huang

Whether the complex game system composed of a large number of artificial intelligence (AI) agents empowered with reinforcement learning can produce extremely favorable collective behaviors just through the way of agent self-exploration is a matter of practical importance. In this paper, we address this question by combining the typical theoretical model of resource allocation system, the minority game model, with reinforcement learning. Each individual participating in the game is set to have a certain degree of intelligence based on reinforcement learning algorithm. In particular, we demonstrate that as AI agents gradually becomes familiar with the unknown environment and tries to provide optimal actions to maximize payoff, the whole system continues to approach the optimal state under certain parameter combinations, herding is effectively suppressed by an oscillating collective behavior which is a self-organizing pattern without any external interference. An interesting phenomenon is that a first-order phase transition is revealed based on some numerical results in our multi-agents system with reinforcement learning. In order to further understand the dynamic behavior of agent learning, we define and analyze the conversion path of belief mode, and find that the self-organizing condensation of belief modes appeared for the given trial and error rates in the AI system. Finally, we provide a detection method for period-two oscillation collective pattern emergence based on the Kullback–Leibler divergence and give the parameter position where the period-two appears.



中文翻译:

强化学习少数博弈系统的自组织优化与相变

由大量具有强化学习能力的人工智能(AI)智能体组成的复杂博弈系统能否仅通过智能体自我探索的方式产生极其有利的集体行为,是一个具有实际意义的问题。在本文中,我们将资源分配系统的典型理论模型——少数博弈模型与强化学习相结合来解决这个问题。基于强化学习算法,每个参与游戏的个体都被设置为具有一定程度的智力。特别是,我们证明,随着人工智能代理逐渐熟悉未知环境并试图提供最优行动以最大化回报,整个系统在某些参数组合下不断接近最优状态,群体振荡行为有效地抑制了羊群效应是一种自组织模式,无需任何外界干扰。一个有趣的现象是,基于我们具有强化学习的多智能体系统中的一些数值结果,揭示了一阶相变。为了进一步理解智能体学习的动态行为,我们定义并分析了信念模式的转换路径,发现对于给定的试错率,人工智能系统中出现了信念模式的自组织凝聚。最后,我们提供了一种基于Kullback-Leibler散度的第二周期振荡集体模式出现的检测方法,并给出了第二周期出现的参数位置。

更新日期:2024-01-24
down
wechat
bug