当前位置: X-MOL 学术Water Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Establishing performance criteria for evaluating watershed-scale sediment and nutrient models at fine temporal scales
Water Research ( IF 11.4 ) Pub Date : 2025-01-18 , DOI: 10.1016/j.watres.2025.123156
Aayush Pandit, Sarah Hogan, David T. Mahoney, William I. Ford, James F. Fox, Christopher Wellen, Admin Husic

Watershed water quality models are mathematical tools used to simulate processes related to water, sediment, and nutrients. These models provide a framework that can be used to inform decision-making and the allocation of resources for watershed management. Therefore, it is critical to answer the question “when is a model good enough?” Established performance evaluation criteria, or thresholds for what is considered a ‘good’ model, provide common benchmarks against which model performance can be compared. Since the publication of prior meta-analyses on this topic, developments in the last decade necessitate further investigation, such as the advancement in high performance computing, the proliferation of aquatic sensors, and the development of machine learning algorithms. We surveyed the literature for quantitative model performance measures, including the Nash-Sutcliffe efficiency (NSE), with a particular focus on process-based models operating at fine temporal scales as their performance evaluation criteria are presently underdeveloped. The synthesis dataset was used to assess the influence of temporal resolution (sub-daily, daily, and monthly), calibration duration (< 3 years, 3 to 8 years, and > 8 years), and constituent target units (concentration, load, and yield) on model performance. The synthesis dataset includes 229 model applications, from which we use bootstrapping and personal modeling experience to establish sub-daily and daily performance evaluation criteria for flow, sediment, total nutrient, and dissolved nutrient models. For daily model evaluation, the NSE for sediment, total nutrient, and dissolved nutrient models should exceed 0.45, 0.30, and 0.35, respectively, for ‘satisfactory’ performance. Model performance generally improved when transitioning from short (< 3 years) to medium (3 to 8 years) calibration durations, but no additional gain was observed with longer (> 8 years) calibration. Dissolved nutrient models calibrated to load (e.g., kg/s) out-performed those calibrated to concentration (e.g., mg/L), whereas selection of target units was not significant for sediment and total nutrient models. We recommend the use of concentration rather than load as a water quality modeling target, as load may be biased by strong flow model performance whereas concentration provides a flow-independent measure of performance. Although the performance criteria developed herein are based on process-based models, they may be useful in assessing machine learning model performance. We demonstrate one such assessment on a recent deep learning model of daily nitrate prediction across the United States. The guidance presented here is intended to be used alongside, rather than to replace, the experience and modeling judgement of engineers and scientist who work to maintain our collective water resources.

中文翻译:


建立在精细时间尺度上评估流域尺度沉积物和营养模型的性能标准



流域水质模型是用于模拟与水、沉积物和营养物质相关的过程的数学工具。这些模型提供了一个框架,可用于为流域管理的决策和资源分配提供信息。因此,回答“什么时候模型足够好”这个问题至关重要。已建立的性能评估标准或被视为“良好”模型的阈值提供了可以比较模型性能的常见基准。自从之前关于该主题的荟萃分析发表以来,过去十年的发展需要进一步调查,例如高性能计算的进步、水生传感器的普及以及机器学习算法的发展。我们调查了定量模型性能测量的文献,包括 Nash-Sutcliffe 效率 (NSE),特别关注在精细时间尺度上运行的基于过程的模型,因为它们的性能评估标准目前尚未开发。综合数据集用于评估时间分辨率(亚日、日和月)、校准持续时间(< 3 年、3 至 8 年和 > 8 年)和组成目标单位(浓度、负载和产量)对模型性能的影响。综合数据集包括 229 个模型应用程序,我们从中利用引导和个人建模经验来建立流量、沉积物、总营养物质和溶解营养模型的次日和日性能评估标准。对于日常模型评估,沉积物、总养分和溶解营养物模型的 NSE 应分别超过 0.45、0.30 和 0.35,以获得“令人满意”的性能。 当从短期 (< 3 年) 过渡到中等 (3 至 8 年) 校准持续时间时,模型性能通常会有所改善,但较长 (> 8 年) 校准没有观察到额外的增益。根据负荷(例如,kg/s)校准的溶解营养物模型优于根据浓度(例如,mg/L)校准的溶解营养物模型,而目标单位的选择对于沉积物和总营养物模型来说并不显著。我们建议使用浓度而不是负荷作为水质建模目标,因为负荷可能会因强大的流量模型性能而产生偏差,而浓度则提供与流量无关的性能衡量标准。尽管此处开发的性能标准基于基于过程的模型,但它们可能有助于评估机器学习模型的性能。我们在最近全美每日硝酸盐预测的深度学习模型上展示了一个这样的评估。此处提供的指南旨在与致力于维护我们集体水资源的工程师和科学家的经验和建模判断一起使用,而不是取代这些经验和建模判断。
更新日期:2025-01-18
down
wechat
bug