Nonuniqueness and convergence to equivalent solutions in observer-based inverse reinforcement learning,Automatica

当前位置： X-MOL 学术 › Automatica › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Nonuniqueness and convergence to equivalent solutions in observer-based inverse reinforcement learning
Automatica ( IF 4.8 ) Pub Date : 2024-10-25 , DOI: 10.1016/j.automatica.2024.111977
Jared Town, Zachary Morrison, Rushikesh Kamalapurkar

A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e., solutions that result in a different cost functional but same feedback matrix. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.

中文翻译：

基于观察者的逆强化学习中的非唯一性和对等价解的收敛性

在线实时解决确定性逆向强化学习（IRL）问题的一个关键挑战是存在多种解决方案。非唯一性需要研究等效解决方案的概念，即导致不同成本泛函但相同反馈矩阵的解决方案。虽然文献中已经开发了导致收敛到等效解决方案的离线算法，但解决非唯一性的在线实时技术不可用。在本文中，开发了一个正则化的历史堆栈观察器，它收敛到 IRL 问题的大致等效解。开发了新颖的数据丰富性条件以促进分析，并提供模拟结果以证明所开发技术的有效性。

更新日期：2024-10-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南