Chemical Engineering Journal ( IF 13.3 ) Pub Date : 2023-09-15 , DOI: 10.1016/j.cej.2023.146069 Yi Zhang , Zhangmu Jing , Yijing Feng , Shuo Chen , Yeqing Li , Yongming Han , Lu Feng , Junting Pan , Mahmoud Mazarji , Hongjun Zhou , Xiaonan Wang , Chunming Xu
Exploring key factors has important guidance for understanding complex anaerobic digestion (AD) systems. This study proposed a multi-layer automated machine learning framework to understand the complex interactions in AD systems and explore key factors at the environmental factor, microorganisms and system levels. The first layer of the framework identified hydraulic residence time (HRT) as the most important environmental factor, with an optimal range of 33–45 d. In the second layer of the framework, Methanocelleus (optimal relative abundance (ORA) = 3.0%) and Candidatus_Caldatribacterium (ORA = 1.7%) were found to be the key archaea and bacteria, respectively. Furthermore, the prediction of key microorganisms based on environmental factors and remaining microbial data showed the essential roles of Methanothermobacter and Acetomicrobium. The third layer for finding the optimal combination of data variables for predicting biogas production demonstrated that combined Archaea genera and environmental factors should be achieved for the most accurate prediction (root mean square error (RMSE) = 84.21). GBM had the best model performance and prediction accuracy among all the built-in models. Based on the optimal GBM model, the analysis at the system level showed that HRT was the most important variable. However the most important microorganism, Methanocelleus, within the appropriate survival range is also essential to achieve optimal biogas production. This research explores key parameters at various levels through automated machine learning techniques, which are expected to provide guidance in understanding the complex architecture of industrial and laboratory AD systems.
中文翻译:
利用自动化机器学习技术探索厌氧消化的关键因素:环境因素、微生物和系统层面
探索关键因素对于理解复杂的厌氧消化(AD)系统具有重要指导意义。本研究提出了一个多层自动化机器学习框架,以了解 AD 系统中复杂的相互作用,并探索环境因素、微生物和系统层面的关键因素。该框架的第一层将水力停留时间(HRT)确定为最重要的环境因素,最佳范围为 33-45 天。在框架的第二层中,Methanocelleus(最佳相对丰度 (ORA) = 3.0%)和Candidatus_Caldatribacteria(ORA = 1.7%) 被发现分别是关键的古细菌和细菌。此外,根据环境因素和剩余微生物数据对关键微生物的预测显示了甲烷嗜热杆菌和醋微菌的重要作用。第三层用于寻找预测沼气产量的数据变量的最佳组合,证明应将古细菌属和环境因素结合起来以获得最准确的预测(均方根误差(RMSE)= 84.21)。在所有内置模型中,GBM 的模型性能和预测精度最好。基于最优GBM模型,系统层面的分析表明HRT是最重要的变量。然而,最重要的微生物是甲烷细胞菌(Methanocelleus),在适当的生存范围内对于实现最佳沼气产量也至关重要。这项研究通过自动化机器学习技术探索了各个层面的关键参数,有望为理解工业和实验室自动驾驶系统的复杂架构提供指导。