Propulsive landing of launchers’ first stages with Deep Reinforcement Learning,Acta Astronautica

当前位置： X-MOL 学术 › Acta Astronaut. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Propulsive landing of launchers’ first stages with Deep Reinforcement Learning
Acta Astronautica ( IF 3.1 ) Pub Date : 2024-11-22 , DOI: 10.1016/j.actaastro.2024.11.028
Davide Iafrate, Andrea Brandonisio, Robert Hinz, Michèle Lavagna

The planetary landing problem is gaining relevance in the space sector, spanning a wide range of applications from unmanned probes landing on other planetary bodies to reusable first and second stages of launcher vehicles. In the existing methodology there is a lack of flexibility in handling complex non-linear dynamics, in particular in the case of non-convexifiable constraints. It is therefore crucial to assess the performance of novel techniques and their advantages and disadvantages. The purpose of this work is the development of an integrated 6-DOF guidance and control approach based on reinforcement learning of deep neural network policies for fuel-optimal planetary landing control, specifically with application to a launcher first-stage terminal landing, and the assessment of its performance and robustness. 3-DOF and 6-DOF simulators are developed and encapsulated in MDP-like (Markov Decision Process) industry-standard compatible environments. Particular care is given in thoroughly shaping reward functions capable of achieving the landing both successfully and in a fuel-optimal manner. A cloud pipeline for effective training of an agent using a PPO reinforcement learning algorithm to successfully achieve the landing goal is developed.

中文翻译：

使用深度强化学习对发射器的第一级推进着陆

行星着陆问题在太空领域越来越重要，涵盖广泛的应用，从无人探测器降落在其他行星体上到可重复使用的第一级和第二级发射器。在现有方法中，在处理复杂的非线性动力学方面缺乏灵活性，尤其是在不可凸约束的情况下。因此，评估新技术的性能及其优缺点至关重要。这项工作的目的是开发一种基于深度神经网络策略的强化学习的综合 6-DOF 制导和控制方法，以实现燃料最优行星着陆控制，特别是应用于发射器第一阶段终端着陆，并评估其性能和稳健性。3-DOF 和 6-DOF 仿真器是在类似 MDP（马尔可夫决策过程）行业标准兼容环境中开发和封装的。在彻底塑造能够成功实现着陆并以燃料最佳方式实现着陆的奖励函数时，我们特别小心。开发了一种云管道，用于使用 PPO 强化学习算法有效训练代理以成功实现着陆目标。

更新日期：2024-11-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南