Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-11-14 , DOI: 10.1007/s40747-024-01667-x Zheng Liu, Wei Xiong, Zhuoya Jia, Chi Han
This paper investigates the agile optical satellite scheduling problem, which aims to arrange an observation sequence and observation actions for observation tasks. Existing research mainly aims to maximize the number of completed tasks or the total priorities of the completed tasks but ignores the influence of the observation actions on the imaging quality. Besides, the conventional exact methods and heuristic methods can hardly obtain a high-quality solution in a short time due to the complicated constraints and considerable solution space of this problem. Thus, this paper proposes a two-stage scheduling framework with two-stage deep reinforcement learning to address this problem. First, the scheduling process is decomposed into a task sequencing stage and an observation scheduling stage, and a mathematical model with complex constraints and two-stage optimization objectives is established to describe the problem. Then, a pointer network with a local selection mechanism and a rough pruning mechanism is constructed as the sequencing network to generate an executable task sequence in the task sequencing stage. Next, a decomposition strategy decomposes the executable task sequence into multiple sub-sequences in the observation scheduling stage, and the observation scheduling process of these sub-sequences is modeled as a concatenated Markov decision process. A neural network is designed as the observation scheduling network to determine observation actions for the sequenced tasks, which is well trained by the soft actor-critic algorithm. Finally, extensive experiments show that the proposed method, along with the designed mechanisms and strategy, is superior to comparison algorithms in terms of solution quality, generalization performance, and computation efficiency.
中文翻译:
面向敏捷光学卫星调度问题的两阶段深度强化学习方法
该文研究了敏捷光学卫星调度问题,旨在为观测任务安排观测序列和观测动作。现有研究主要旨在最大化已完成任务的数量或已完成任务的总优先级,而忽略了观测动作对成像质量的影响。此外,由于该问题的约束条件复杂且求解空间较大,传统的精确方法和启发式方法很难在短时间内获得高质量的解。因此,本文提出了一个具有两阶段深度强化学习的两阶段调度框架来解决这个问题。首先,将调度过程分解为任务排序阶段和观察调度阶段,并建立具有复杂约束条件和两阶段优化目标的数学模型来描述问题;然后,构建具有局部选择机制和粗剪机制的指针网络作为排序网络,在任务排序阶段生成可执行的任务序列。接下来,在观测调度阶段,一个分解策略将可执行的任务序列分解为多个子序列,并将这些子序列的观测调度过程建模为级联的马尔可夫决策过程。神经网络被设计为观察调度网络,以确定序列任务的观察动作,该网络由软 Actor-Critic 算法进行了很好的训练。最后,大量实验表明,所提出的方法以及所设计的机制和策略在求解质量、泛化性能和计算效率方面优于比较算法。