当前位置: X-MOL 学术Nat. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Maximum diffusion reinforcement learning
Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2024-05-02 , DOI: 10.1038/s42256-024-00829-3
Thomas A. Berrueta , Allison Pinosky , Todd D. Murphey

Robots and animals both experience the world through their bodies and senses. Their embodiment constrains their experiences, ensuring that they unfold continuously in space and time. As a result, the experiences of embodied agents are intrinsically correlated. Correlations create fundamental challenges for machine learning, as most techniques rely on the assumption that data are independent and identically distributed. In reinforcement learning, where data are directly collected from an agent’s sequential experiences, violations of this assumption are often unavoidable. Here we derive a method that overcomes this issue by exploiting the statistical mechanics of ergodic processes, which we term maximum diffusion reinforcement learning. By decorrelating agent experiences, our approach provably enables single-shot learning in continuous deployments over the course of individual task attempts. Moreover, we prove our approach generalizes well-known maximum entropy techniques and robustly exceeds state-of-the-art performance across popular benchmarks. Our results at the nexus of physics, learning and control form a foundation for transparent and reliable decision-making in embodied reinforcement learning agents.



中文翻译:

最大扩散强化学习

机器人和动物都通过身体和感官来体验世界。他们的化身限制了他们的经历,确保他们在空间和时间上不断展开。因此,具体主体的经验具有内在的相关性。相关性给机器学习带来了根本性的挑战,因为大多数技术都依赖于数据独立且同分布的假设。在强化学习中,数据是直接从代理的连续经验中收集的,违反这一假设通常是不可避免的。在这里,我们推导出一种通过利用遍历过程的统计力学来克服这个问题的方法,我们将其称为最大扩散强化学习。通过解除代理体验的相关性,我们的方法可以证明可以在单个任务尝试过程中的连续部署中实现单次学习。此外,我们证明我们的方法概括了众所周知的最大熵技术,并且在流行的基准测试中稳健地超越了最先进的性能。我们在物理、学习和控制之间的联系的结果为具体强化学习代理的透明和可靠的决策奠定了基础。

更新日期:2024-05-02
down
wechat
bug