当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predictive air combat decision model with segmented reward allocation
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-07-22 , DOI: 10.1007/s40747-024-01556-3
Yundi Li , Yinlong Yuan , Yun Cheng , Liang Hua

In air combat missions, unmanned combat aerial vehicles (UCAVs) must take strategic actions to establish combat advantages, enabling effective tracking and attacking of enemy UCAVs. Currently, a lot of reinforcement learning algorithms are applied to the air combat mission of unmanned fighter aircraft. However, most algorithms can only select policies based on the current state of both sides. This leads to the inability to effectively track and attack when the enemy performs large angle maneuvering. Additionally, these algorithms cannot adapt to different situations, resulting in the unmanned fighter aircraft being at a disadvantage in some cases. To solve these problems, this paper proposes predictive air combat decision model with segmented reward allocation for air combat tracking and attacking. On the basis of the air combat environment, we propose the prediction soft actor-critic (Pre-SAC) algorithm, which combines the prediction of enemy states with the states of UCAV for model training. This enables the UCAV to predict the next move of the enemy UCAV in advance and establish a greater air combat advantage for us. Furthermore, by adopting a segmented reward allocation model and combining it with the Pre-SAC algorithm, we propose the segmented reward allocation soft actor-critic (Sra-SAC) algorithm, which solves the problem of UCAVs being unable to adapt to different situations. The results show that the prediction-based segmented reward allocation the Sra-SAC algorithm outperforms the traditional soft actor-critic (SAC) algorithm in terms of overall reward, travel distance, and relative advantage.



中文翻译:


分段奖励分配的预测空战决策模型



在空战任务中,无人作战飞行器(UCAV)必须采取战略行动来建立战斗优势,从而能够有效跟踪和攻击敌方无人作战飞行器。目前,很多强化学习算法被应用到无人战斗机的空战任务中。然而,大多数算法只能根据双方当前的状态来选择策略。这导致敌方大角度机动时无法有效跟踪攻击。此外,这些算法无法适应不同的情况,导致无人机在某些情况下处于劣势。针对这些问题,本文提出了用于空战跟踪和攻击的分段奖励分配的预测空战决策模型。在空战环境的基础上,我们提出了预测软演员批评家(Pre-SAC)算法,该算法将敌方状态的预测与UCAV的状态相结合进行模型训练。这使得UCAV能够提前预测敌方UCAV的下一步行动,为我方建立更大的空战优势。此外,通过采用分段奖励分配模型并与Pre-SAC算法相结合,我们提出了分段奖励分配软演员批评家(Sra-SAC)算法,解决了UCAV无法适应不同情况的问题。结果表明,基于预测的分段奖励分配 Sra-SAC 算法在整体奖励、行进距离和相对优势方面优于传统的软演员批评家 (SAC) 算法。

更新日期:2024-07-22
down
wechat
bug