当前位置: X-MOL 学术Inform. Fusion › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Zero-shot sim-to-real transfer using Siamese-Q-Based reinforcement learning
Information Fusion ( IF 14.7 ) Pub Date : 2024-09-06 , DOI: 10.1016/j.inffus.2024.102664
Zhenyu Zhang , Shaorong Xie , Han Zhang , Xiangfeng Luo , Hang Yu

To address real world decision problems in reinforcement learning, it is common to train a policy in a simulator first for safety. Unfortunately, the sim-real gap hinders effective simulation-to-real transfer without substantial training data. However, collecting real samples of complex tasks is often impractical, and the sample inefficiency of reinforcement learning exacerbates the simulation-to-real problem, even with online interaction or data. Representation learning can improve sample efficiency while keeping generalization by projecting high-dimensional inputs into low-dimensional representations. However, whether trained independently or simultaneously with reinforcement learning, representation learning remains a separate auxiliary task, lacking task-related features and generalization for simulation-to-real transfer. This paper proposes Siamese-Q, a new representation learning method employing Siamese networks and zero-shot simulation-to-real transfer, which narrows the distance between inputs with the same semantics in the latent space with respect to Q values. This allows us to fuse task-related information into the representation and improve the generalization of the policy. Evaluation in virtual and real autonomous vehicle scenarios demonstrates substantial improvements of 19.5% and 94.2% respectively over conventional representation learning, without requiring any real-world observations or on-policy interaction, and enabling reinforcement learning policies trained in simulations transfer to reality.

中文翻译:


使用基于 Siamese-Q 的强化学习进行零样本模拟到真实的迁移



为了解决强化学习中的现实世界决策问题,为了安全起见,通常首先在模拟器中训练策略。不幸的是,如果没有大量的训练数据,模拟与真实的差距会阻碍有效的模拟到真实的转换。然而,收集复杂任务的真实样本通常是不切实际的,而且强化学习的样本效率低下加剧了模拟到真实的问题,即使是在线交互或数据也是如此。表示学习可以通过将高维输入投影到低维表示来提高样本效率,同时保持泛化。然而,无论是独立训练还是与强化学习同时训练,表示学习仍然是一个单独的辅助任务,缺乏与任务相关的特征和模拟到真实迁移的泛化。本文提出了 Siamese-Q,这是一种采用 Siamese 网络和零样本模拟到真实迁移的新表示学习方法,它缩小了潜在空间中具有相同语义的输入之间关于 Q 值的距离。这使我们能够将与任务相关的信息融合到表示中并提高策略的泛化能力。在虚拟和真实自动驾驶汽车场景中的评估表明,与传统表示学习相比,分别有 19.5% 和 94.2% 的大幅改进,不需要任何现实世界的观察或策略上的交互,并且使在模拟中训练的强化学习策略能够转移到现实中。
更新日期:2024-09-06
down
wechat
bug