当前位置:
X-MOL 学术
›
Automatica
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Nonuniqueness and convergence to equivalent solutions in observer-based inverse reinforcement learning
Automatica ( IF 4.8 ) Pub Date : 2024-10-25 , DOI: 10.1016/j.automatica.2024.111977 Jared Town, Zachary Morrison, Rushikesh Kamalapurkar
Automatica ( IF 4.8 ) Pub Date : 2024-10-25 , DOI: 10.1016/j.automatica.2024.111977 Jared Town, Zachary Morrison, Rushikesh Kamalapurkar
A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e. , solutions that result in a different cost functional but same feedback matrix. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.
中文翻译:
基于观察者的逆强化学习中的非唯一性和对等价解的收敛性
在线实时解决确定性逆向强化学习 (IRL) 问题的一个关键挑战是存在多种解决方案。非唯一性需要研究等效解决方案的概念,即导致不同成本泛函但相同反馈矩阵的解决方案。虽然文献中已经开发了导致收敛到等效解决方案的离线算法,但解决非唯一性的在线实时技术不可用。在本文中,开发了一个正则化的历史堆栈观察器,它收敛到 IRL 问题的大致等效解。开发了新颖的数据丰富性条件以促进分析,并提供模拟结果以证明所开发技术的有效性。
更新日期:2024-10-25
中文翻译:
基于观察者的逆强化学习中的非唯一性和对等价解的收敛性
在线实时解决确定性逆向强化学习 (IRL) 问题的一个关键挑战是存在多种解决方案。非唯一性需要研究等效解决方案的概念,即导致不同成本泛函但相同反馈矩阵的解决方案。虽然文献中已经开发了导致收敛到等效解决方案的离线算法,但解决非唯一性的在线实时技术不可用。在本文中,开发了一个正则化的历史堆栈观察器,它收敛到 IRL 问题的大致等效解。开发了新颖的数据丰富性条件以促进分析,并提供模拟结果以证明所开发技术的有效性。