Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Insights into the prediction uncertainty of machine-learning-based digital soil mapping through a local attribution approach
Soil ( IF 5.8 ) Pub Date : 2024-09-30 , DOI: 10.5194/soil-10-679-2024 Jeremy Rohmer, Stephane Belbeze, Dominique Guyonnet
Soil ( IF 5.8 ) Pub Date : 2024-09-30 , DOI: 10.5194/soil-10-679-2024 Jeremy Rohmer, Stephane Belbeze, Dominique Guyonnet
Abstract. Machine learning (ML) models have become key ingredients for digital soil mapping. To improve the interpretability of their predictions, diagnostic tools such as the widely used local attribution approach known as SHapley Additive exPlanations (SHAP) have been developed. However, the analysis of ML model predictions is only one part of the problem, and there is an interest in obtaining deeper insights into the drivers of the prediction uncertainty as well, i.e. explaining why an ML model is confident given the set of chosen covariate values in addition to why the ML model delivered some particular results. In this study, we show how to apply SHAP to local prediction uncertainty estimates for a case of urban soil pollution – namely, the presence of petroleum hydrocarbons in soil in Toulouse (France), which pose a health risk via vapour intrusion into buildings, direct soil ingestion, and groundwater contamination. Our results show that the drivers of the prediction best estimates are not necessarily the drivers of confidence in these predictions, and we identify those leading to a reduction in uncertainty. Our study suggests that decisions regarding data collection and covariate characterisation as well as communication of the results should be made accordingly.
中文翻译:
通过局部归因方法洞察基于机器学习的数字土壤测绘的预测不确定性
摘要。机器学习 (ML) 模型已成为数字土壤测绘的关键要素。为了提高预测的可解释性,人们开发了诊断工具,例如广泛使用的局部归因方法,称为 SHapley 加法解释 (SHAP)。然而,ML 模型预测的分析只是问题的一部分,人们也有兴趣更深入地了解预测不确定性的驱动因素,即解释为什么 ML 模型在给定一组选定的协变量值的情况下是有信心的除了为什么 ML 模型提供了一些特定结果之外。在这项研究中,我们展示了如何将 SHAP 应用于城市土壤污染案例的局部预测不确定性估计,即图卢兹(法国)土壤中存在石油碳氢化合物,它通过蒸气侵入建筑物、直接造成健康风险土壤摄入和地下水污染。我们的结果表明,预测最佳估计的驱动因素不一定是这些预测的信心驱动因素,并且我们确定了那些导致不确定性减少的驱动因素。我们的研究表明,应相应地做出有关数据收集和协变量表征以及结果交流的决定。
更新日期:2024-09-30
中文翻译:
通过局部归因方法洞察基于机器学习的数字土壤测绘的预测不确定性
摘要。机器学习 (ML) 模型已成为数字土壤测绘的关键要素。为了提高预测的可解释性,人们开发了诊断工具,例如广泛使用的局部归因方法,称为 SHapley 加法解释 (SHAP)。然而,ML 模型预测的分析只是问题的一部分,人们也有兴趣更深入地了解预测不确定性的驱动因素,即解释为什么 ML 模型在给定一组选定的协变量值的情况下是有信心的除了为什么 ML 模型提供了一些特定结果之外。在这项研究中,我们展示了如何将 SHAP 应用于城市土壤污染案例的局部预测不确定性估计,即图卢兹(法国)土壤中存在石油碳氢化合物,它通过蒸气侵入建筑物、直接造成健康风险土壤摄入和地下水污染。我们的结果表明,预测最佳估计的驱动因素不一定是这些预测的信心驱动因素,并且我们确定了那些导致不确定性减少的驱动因素。我们的研究表明,应相应地做出有关数据收集和协变量表征以及结果交流的决定。