当前位置: X-MOL 学术J. Hydrol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Effects of multicollinearity and data granularity on regression models of stream temperature
Journal of Hydrology ( IF 5.9 ) Pub Date : 2024-06-27 , DOI: 10.1016/j.jhydrol.2024.131572
Halil I. Dertli , Daniel B. Hayes , Troy G. Zorn

Water temperature is a key factor influencing biota of stream ecosystems. Hence, it is important to comprehend the environmental drivers of stream temperature for robust prediction of conditions and effective management of stream communities. Linear regression models are commonly used for predictive purposes, but their predictive capacity and interpretability can be significantly affected by their complexity and the structure of input data. In some cases, researchers may be obligated to favor prediction power or interpretability while compromising the other. Therefore, insight into relationships between model fit, correlation among predictor variables (i.e., multicollinearity), and level of temporal aggregation of data (i.e., data granularity) may be helpful to reduce such trade-offs. In this paper, we investigated these relationships within a hierarchical set of multiple linear regression (MLR) models examining environmental factors influencing stream temperature dynamics. Our findings showed that as the number of predictor variables (i.e., model complexity) increased, the magnitude of multicollinearity in MLR models increased, but model fit also increased. The results also revealed that using data averaged over longer time frames (i.e., coarser data granularity) yielded high multicollinearity, as indexed by variance inflation factor values (VIF) for all model predictors. This led to higher variance in parameter estimates (i.e., parameter instability) and potential challenges in model interpretation as the sign of parameter estimates changed in many streams examined. Multicollinearity was not the only reason for these changes in the sign of parameter estimates as they were also observed in simple linear regression models across varying levels of data granularity. Based on our findings, we conclude that the selection of data granularity is an important consideration in multiple regression modeling, with profound implications for model interpretability.

中文翻译:


多重共线性和数据粒度对河流温度回归模型的影响



水温是影响河流生态系统生物群的关键因素。因此,了解河流温度的环境驱动因素对于稳健预测河流群落的条件和有效管理非常重要。线性回归模型通常用于预测目的,但其预测能力和可解释性可能会受到其复杂性和输入数据结构的显着影响。在某些情况下,研究人员可能有义务偏向预测能力或可解释性,同时牺牲另一个。因此,深入了解模型拟合、预测变量之间的相关性(即多重共线性)和数据时间聚合水平(即数据粒度)之间的关系可能有助于减少此类权衡。在本文中,我们研究了一组分层的多元线性回归(MLR)模型中的这些关系,该模型检查了影响流温度动态的环境因素。我们的研究结果表明,随着预测变量数量(即模型复杂性)的增加,MLR 模型中的多重共线性程度增加,但模型拟合度也增加。结果还表明,使用较长时间范围内的平均数据(即较粗的数据粒度)会产生较高的多重共线性,正如所有模型预测变量的方差膨胀因子值 (VIF) 所索引的那样。这导致参数估计的较大方差(即参数不稳定)以及模型解释的潜在挑战,因为参数估计的符号在许多检查的流中发生了变化。多重共线性并不是参数估计符号发生这些变化的唯一原因,因为在不同数据粒度级别的简单线性回归模型中也观察到了多重共线性。 根据我们的研究结果,我们得出结论,数据粒度的选择是多元回归建模中的一个重要考虑因素,对模型的可解释性具有深远的影响。
更新日期:2024-06-27
down
wechat
bug