当前位置: X-MOL 学术J. Hazard. Mater. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine learning models with innovative outlier detection techniques for predicting heavy metal contamination in soils
Journal of Hazardous Materials ( IF 12.2 ) Pub Date : 2024-11-19 , DOI: 10.1016/j.jhazmat.2024.136536
Ram Proshad, S.M. Asharaful Abedin Asha, Rong Tan, Yineng Lu, Md Anwarul Abedin, Zihao Ding, Shuangting Zhang, Ziyi Li, Geng Chen, Zhuanjun Zhao

Machine learning (ML) models for accurately predicting heavy metals with inconsistent outputs have improved owing to dataset outliers, which influence model reliability and accuracy. A comprehensive technique that combines machine learning and advanced statistical methods was applied to assess data outlier’s effects on ML models. Ten ML models with three outlier detection methods predicted Cr, Ni, Cd, and Pb in Narayanganj soils. XGBoost with density-based spatial clustering of applications with noise (DBSCAN) improved model efficacy (R2). The R2 of Cr, Ni, Cd, and Pb was considerably enhanced by 11.11 %, 6.33 %, 14.47 %, and 5.68 %, respectively, indicating that outliers affected the model's HM prediction. Soil factors affected Cr (80 %), Ni (72.61 %), Cd (53.35 %), and Pb (63.47 %) concentrations based on feature importance. Contamination factor prediction showed considerable contamination for Cr, Ni, and Cd. LISA revealed Cd (55.4 %), Cr (49.3 %), and Pb (47.3 %) as the significant pollutant (p < 0.05). Moran's I index values for Cr, Ni, Cd, and Pb were 0.65, 0.58, 0.60, and 0.66, respectively, indicating strong positive spatial autocorrelation and clusters with similar contamination. Finally, this work successfully assessed the influence of data outliers on the ML model for soil HM contamination prediction, identifying crucial regions that require rapid conservation measures.

中文翻译:


具有创新异常值检测技术的机器学习模型,用于预测土壤中的重金属污染



由于数据集异常值,用于准确预测输出不一致的重金属的机器学习 (ML) 模型得到了改进,这会影响模型的可靠性和准确性。应用了一种结合了机器学习和高级统计方法的综合技术来评估数据异常值对 ML 模型的影响。具有三种异常值检测方法的 10 个 ML 模型预测了 Narayanganj 土壤中的 Cr、Ni、Cd 和 Pb。XGBoost 具有基于密度的应用程序空间聚类与噪声 (DBSCAN) 提高了模型效能 (R2)。Cr、Ni、Cd 和 Pb 的 R2 分别显著提高了 11.11 %、6.33 %、14.47 % 和 5.68 %,表明异常值影响了模型的 HM 预测。土壤因素根据特征重要性影响 Cr (80 %)、Ni (72.61 %)、Cd (53.35 %) 和 Pb (63.47 %) 浓度。污染因子预测显示 Cr、Ni 和 Cd 受到相当大的污染。LISA 显示 Cd (55.4 %)、Cr (49.3 %) 和 Pb (47.3 %) 是显著污染物 (p < 0.05)。Cr、Ni、Cd 和 Pb 的 Moran I 指数值分别为 0.65、0.58、0.60 和 0.66,表明具有很强的正空间自相关和具有相似污染的集群。最后,这项工作成功评估了数据异常值对土壤 HM 污染预测 ML 模型的影响,确定了需要快速保护措施的关键区域。
更新日期:2024-11-19
down
wechat
bug