当前位置:
X-MOL 学术
›
Acta Astronaut.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Robust solar sail trajectories using proximal policy optimization
Acta Astronautica ( IF 3.1 ) Pub Date : 2024-11-09 , DOI: 10.1016/j.actaastro.2024.10.065 Christian Bianchi, Lorenzo Niccolai, Giovanni Mengali
Acta Astronautica ( IF 3.1 ) Pub Date : 2024-11-09 , DOI: 10.1016/j.actaastro.2024.10.065 Christian Bianchi, Lorenzo Niccolai, Giovanni Mengali
Reinforcement learning is used to design minimum-time trajectories of solar sails subject to the typical sources of uncertainty associated with such a propulsion system, i.e., inaccurate knowledge of the sail’s optical properties and the presence of wrinkles on the sail membrane. A proximal policy optimization (PPO) algorithm is used to train the agent and derive the control policy that associates the optimal sail attitude with each dynamic state. First, the agent is trained assuming deterministic unperturbed dynamics, and the results are compared with optimal solutions found by an indirect optimization method, thus demonstrating the effectiveness of this approach. Next, two stochastic scenarios are analysed. In the first, the optical coefficients of the sail are assumed to be random variables with Gaussian distribution, which leads to random variations in the sail characteristic acceleration. In the second scenario, wrinkles on the sail membrane are taken into account, resulting in a misalignment of the thrust vector with respect to a perfectly smooth surface. Both phenomena are modelled based on experimental measurements available in the literature in order to perform realistic analyses. In the stochastic scenarios, Monte Carlo simulations are performed using the trained policies, demonstrating that the reinforcement learning approach is capable of finding near time-optimal solutions, while also being robust to the sources of uncertainty considered.
中文翻译:
使用近端策略优化的稳健太阳能帆轨迹
强化学习用于设计太阳帆的最短时间轨迹,这些轨迹受制于与这种推进系统相关的典型不确定性来源,即对帆的光学特性的不准确了解和帆膜上存在皱纹。近端策略优化 (PPO) 算法用于训练代理并推导出将最佳航行姿态与每个动态状态相关联的控制策略。首先,假设确定性无扰动动力学对代理进行训练,并将结果与间接优化方法找到的最优解决方案进行比较,从而证明这种方法的有效性。接下来,分析两种随机场景。在第一种情况下,假设帆的光学系数是具有高斯分布的随机变量,这导致帆特征加速度的随机变化。在第二种情况下,考虑了帆膜上的皱纹,导致推力矢量相对于完美光滑的表面错位。这两种现象都是根据文献中提供的实验测量结果建模的,以便进行真实的分析。在随机场景中,使用经过训练的策略执行蒙特卡洛模拟,这表明强化学习方法能够找到接近时间最优的解决方案,同时对所考虑的不确定性来源也具有鲁棒性。
更新日期:2024-11-09
中文翻译:
使用近端策略优化的稳健太阳能帆轨迹
强化学习用于设计太阳帆的最短时间轨迹,这些轨迹受制于与这种推进系统相关的典型不确定性来源,即对帆的光学特性的不准确了解和帆膜上存在皱纹。近端策略优化 (PPO) 算法用于训练代理并推导出将最佳航行姿态与每个动态状态相关联的控制策略。首先,假设确定性无扰动动力学对代理进行训练,并将结果与间接优化方法找到的最优解决方案进行比较,从而证明这种方法的有效性。接下来,分析两种随机场景。在第一种情况下,假设帆的光学系数是具有高斯分布的随机变量,这导致帆特征加速度的随机变化。在第二种情况下,考虑了帆膜上的皱纹,导致推力矢量相对于完美光滑的表面错位。这两种现象都是根据文献中提供的实验测量结果建模的,以便进行真实的分析。在随机场景中,使用经过训练的策略执行蒙特卡洛模拟,这表明强化学习方法能够找到接近时间最优的解决方案,同时对所考虑的不确定性来源也具有鲁棒性。