A self-imitation learning approach for scheduling evaporation and encapsulation stages of OLED display manufacturing systems,Robotics and Computer-Integrated Manufacturing

当前位置： X-MOL 学术 › Robot. Comput.-Integr. Manuf. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A self-imitation learning approach for scheduling evaporation and encapsulation stages of OLED display manufacturing systems
Robotics and Computer-Integrated Manufacturing ( IF 9.1 ) Pub Date : 2024-11-29 , DOI: 10.1016/j.rcim.2024.102917
Donghun Lee, In-Beom Park, Kwanho Kim

In modern organic light-emitting diode (OLED) manufacturing systems, scheduling is a key decision-making problem to improve productivity. In particular, the scheduling of evaporation and encapsulation stages has been confronted with complicated constraints such as job-splitting property, preventive maintenance, machine eligibility, family setups, and heterogeneous release time of jobs. To efficiently solve such complicated scheduling problems, reinforcement learning (RL) has drawn increasing attention as an alternative in recent years. Unfortunately, the performance of the RL-based scheduling methods might not be satisfactory since unexpected correlations between actions are caused by machine eligibility restrictions, making it more challenging to address the credit assignment problem. To minimize the total tardiness, this article proposes a self-imitation learning-based scheduling method in which an agent utilizes past good experiences to exploit efficient exploration. Furthermore, a novel return design is introduced to overcome the credit assignment problem by considering machine eligibility restrictions. To prove the effectiveness and efficiency of the proposed method, numerical experiments are carried out by using the datasets that simulated the real-world OLED display manufacturing systems. Experiment results demonstrate that the proposed method outperforms other baselines, including rule-based and meta-heuristics, as well as the other DRL-based method in terms of the total tardiness while reducing computation time compared to meta-heuristics.

中文翻译：

一种用于调度 OLED 显示器制造系统蒸发和封装阶段的自仿真学习方法

在现代有机发光二极管（OLED）制造系统中，调度是提高生产力的关键决策问题。特别是，蒸发和封装阶段的调度面临着复杂的限制，例如作业拆分属性、预防性维护、机器资格、系列设置和作业的异构发布时间。为了有效地解决这些复杂的调度问题，强化学习（RL）近年来作为一种替代方案越来越受到关注。遗憾的是，基于 RL 的调度方法的性能可能并不令人满意，因为操作之间的意外关联是由机器资格限制引起的，这使得解决信用分配问题更具挑战性。为了最大限度地减少总体延迟，本文提出了一种基于自我模仿学习的调度方法，其中代理利用过去的良好经验来利用有效的探索。此外，通过考虑机器资格限制，引入了一种新颖的返回设计来克服信用分配问题。为了证明所提出的方法的有效性和效率，使用模拟真实世界 OLED 显示器制造系统的数据集进行了数值实验。实验结果表明，所提出的方法在总延迟性方面优于其他基线，包括基于规则和元启发式，以及其他基于 DRL 的方法，同时与元启发式相比减少了计算时间。

更新日期：2024-11-29

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南