当前位置: X-MOL 学术Eur. J. Oper. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dynamic resource matching in manufacturing using deep reinforcement learning
European Journal of Operational Research ( IF 6.0 ) Pub Date : 2024-05-15 , DOI: 10.1016/j.ejor.2024.05.027
Saunak Kumar Panda , Yisha Xiang , Ruiqi Liu

Matching plays an important role in the logical allocation of resources across a wide range of industries. The benefits of matching have been increasingly recognized in manufacturing industries. In particular, capacity sharing has received much attention recently. In this paper, we consider the problem of dynamically matching demand-capacity types of manufacturing resources. We formulate the multi-period, many-to-many manufacturing resource-matching problem as a sequential decision process. The formulated manufacturing resource-matching problem involves large state and action spaces, and it is not practical to accurately model the joint distribution of various types of demands. To address the curse of dimensionality and the difficulty of explicitly modeling the transition dynamics, we use a model-free deep reinforcement learning approach to find optimal matching policies. Moreover, to tackle the issue of infeasible actions and slow convergence due to initial biased estimates caused by the maximum operator in Q-learning, we introduce two penalties to the traditional Q-learning algorithm: a domain knowledge-based penalty based on a prior policy and an infeasibility penalty that conforms to the demand–supply constraints. We establish theoretical results on the convergence of our domain knowledge-informed Q-learning providing performance guarantee for small-size problems. For large-size problems, we further inject our modified approach into the deep deterministic policy gradient (DDPG) algorithm, which we refer to as domain knowledge-informed DDPG (DKDDPG). In our computational study, including small- and large-scale experiments, DKDDPG consistently outperformed traditional DDPG and other RL algorithms, yielding higher rewards and demonstrating greater efficiency in time and episodes.

中文翻译:


使用深度强化学习的制造动态资源匹配



匹配在各行业资源的逻辑配置中发挥着重要作用。匹配的好处在制造业中已经越来越被认识到。尤其是容量共享最近备受关注。在本文中,我们考虑动态匹配制造资源的需求-能力类型的问题。我们将多周期、多对多的制造资源匹配问题表述为一个顺序决策过程。公式化的制造资源匹配问题涉及较大的状态和动作空间,精确建模各类需求的联合分布是不切实际的。为了解决维度灾难和显式建模过渡动态的困难,我们使用无模型深度强化学习方法来寻找最佳匹配策略。此外,为了解决 Q 学习中最大算子引起的初始偏差估计导致的不可行动作和收敛速度慢的问题,我们在传统 Q 学习算法中引入了两种惩罚:基于先验策略的基于领域知识的惩罚以及符合供需约束的不可行性惩罚。我们建立了基于领域知识的 Q 学习收敛的理论结果,为小规模问题提供了性能保证。对于大型问题,我们进一步将修改后的方法注入深度确定性策略梯度(DDPG)算法,我们将其称为领域知识通知的 DDPG(DKDDPG)。 在我们的计算研究中,包括小规模和大规模实验,DKDDPG 始终优于传统 DDPG 和其他 RL 算法,产生更高的奖励,并在时间和剧集方面表现出更高的效率。
更新日期:2024-05-15
down
wechat
bug