Time series-based machine learning for forecasting multivariate water quality in full-scale drinking water treatment with various reagent dosages,Water Research

当前位置： X-MOL 学术 › Water Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Time series-based machine learning for forecasting multivariate water quality in full-scale drinking water treatment with various reagent dosages
Water Research ( IF 11.4 ) Pub Date : 2024-11-09 , DOI: 10.1016/j.watres.2024.122777
Hongjiao Pang, Yawen Ben, Yong Cao, Shen Qu, Chengzhi Hu

Accurately predicting drinking water quality is critical for intelligent water supply management and for maintaining the stability and efficiency of water treatment processes. This study presents an optimized time series machine learning approach for accurately predicting multivariate drinking water quality, explicitly considering the time-dependent effects of reagent dosing. By leveraging data from a full-scale treatment plant, we constructed feature-engineered time series datasets incorporating influent water quality parameters, reagent dosages and effluent water characteristics. Seven predictive models, including both traditional machine learning (ML) and deep learning (DL) models were developed and rigorously evaluated against a naive mean baseline model. Our results demonstrate that traditional ML models, enhanced with time feature engineering, rivaled the performance of both widely used DL models and the naive mean baseline model. Specifically, an XGBoost model achieved superior prediction accuracy in dynamically forecasting four water quality characteristics at a 12-hour time lag step, outperforming the naive baseline model by 3-4% in terms of Mean Absolute Percentage Error (MAPE). This finding underscores the importance of incorporating a 12-hour interval to effectively capture the delayed impact of reagent dosing on water quality prediction. Furthermore, SHAP model interpretability analysis provided valuable insights into the XGBoost model's decision-making process, revealing its strong data-driven foundation aligned with established water treatment principles. This research highlights the significant potential of optimized machine learning techniques for enhancing water purification processes and enabling more informed, data-driven decision-making in the water supply industry.

中文翻译：

基于时间序列的机器学习，用于预测各种试剂剂量的全尺寸饮用水处理中的多变量水质

准确预测饮用水水质对于智能供水管理以及保持水处理过程的稳定性和效率至关重要。本研究提出了一种优化的时间序列机器学习方法，用于准确预测多变量饮用水质量，明确考虑了试剂加样的时间依赖性影响。通过利用来自全面处理厂的数据，我们构建了特征工程的时间序列数据集，其中包含进水水质参数、试剂剂量和出水特性。开发了 7 个预测模型，包括传统的机器学习（ML）和深度学习（DL）模型，并根据朴素平均基线模型进行了严格评估。我们的结果表明，传统的 ML 模型通过时间特征工程得到增强，其性能可与广泛使用的 DL 模型和朴素平均基线模型的性能相媲美。具体来说，XGBoost 模型在 12 小时时滞步长内动态预测四个水质特征方面实现了卓越的预测精度，在平均绝对百分比误差（MAPE）方面比初始基线模型高出 3-4%。这一发现强调了纳入 12 小时间隔的重要性，以有效捕捉试剂加样对水质预测的延迟影响。此外，SHAP 模型可解释性分析为 XGBoost 模型的决策过程提供了有价值的见解，揭示了其与既定水处理原则一致的强大数据驱动基础。这项研究强调了优化的机器学习技术在改进水净化过程和在供水行业中实现更明智、数据驱动的决策方面的巨大潜力。

更新日期：2024-11-09

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南