当前位置: X-MOL 学术Biotechnol. Bioeng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reinforcement learning based temperature control of a fermentation bioreactor for ethanol production
Biotechnology and Bioengineering ( IF 3.5 ) Pub Date : 2024-06-28 , DOI: 10.1002/bit.28784
Nagabhushanamgari Rajasekhar 1 , Thota Karunakaran Radhakrishnan 1 , Samsudeen Naina Mohamed 1
Affiliation  

Ethanol production is a significant industrial bioprocess for energy. The primary objective of this study is to control the process reactor temperature to get the desired product, that is, ethanol. Advanced model‐based control systems face challenges due to model‐process mismatch, but Reinforcement Learning (RL) is a class of machine learning which can help by allowing agents to learn policies directly from the environment. Hence a RL algorithm called twin delayed deep deterministic policy gradient (TD3) is employed. The control of reactor temperature is categorized into two categories namely unconstrained and constrained control approaches. The TD3 with various reward functions are tested on a nonlinear bioreactor model. The results are compared with existing popular RL algorithm, namely, deep deterministic policy gradient (DDPG) algorithm with a performance measure such as mean squared error (MSE). In the unconstrained control of the bioreactor, the TD3 based controller designed with the integral absolute error (IAE) reward yields a lower MSE of 0.22, whereas the DDPG produces an MSE of 0.29. Similarly, in the case of constrained controller, TD3 based controller designed with the IAE reward yields a lower MSE of 0.38, whereas DDPG produces an MSE of 0.48. In addition, the TD3 trained agent successfully rejects the disturbances, namely, input flow rate and inlet temperature in addition to a setpoint change with better performance metrics.

中文翻译:


基于强化学习的乙醇生产发酵生物反应器温度控制



乙醇生产是重要的能源工业生物过程。本研究的主要目的是控制工艺反应器温度以获得所需的产品,即乙醇。基于模型的先进控制系统面临着由于模型过程不匹配而带来的挑战,但强化学习(RL)是一类机器学习,它可以通过允许代理直接从环境中学习策略来提供帮助。因此,采用了一种称为孪生延迟深度确定性策略梯度(TD3)的强化学习算法。反应器温度的控制分为两类,即无约束控制方法和约束控制方法。具有各种奖励函数的 TD3 在非线性生物反应器模型上进行了测试。结果与现有流行的 RL 算法(即具有均方误差(MSE)等性能指标的深度确定性策略梯度(DDPG)算法)进行了比较。在生物反应器的无约束控制中,采用积分绝对误差 (IAE) 奖励设计的基于 TD3 的控制器产生的 MSE 较低,为 0.22,而 DDPG 产生的 MSE 为 0.29。同样,在受限控制器的情况下,采用 IAE 奖励设计的基于 TD3 的控制器产生的 MSE 较低,为 0.38,而 DDPG 产生的 MSE 为 0.48。此外,经过 TD3 训练的代理成功地抑制了干扰,即输入流量和入口温度以及具有更好性能指标的设定点变化。
更新日期:2024-06-28
down
wechat
bug