Journal of Geodesy ( IF 3.9 ) Pub Date : 2024-06-18 , DOI: 10.1007/s00190-024-01855-0 Yangkang Yu , Ling Yang , Yunzhong Shen
The issue of outliers has been a research focus in the field of geodesy. Based on a statistical testing method known as the w-test, data snooping along with its iterative form, iterative data snooping (IDS), is commonly used to diagnose outliers in linear models. However, in the case of multiple outliers, it may suffer from the masking and swamping effects, thereby limiting the detection and identification capabilities. This contribution is to investigate the cause of masking and swamping effects and propose a new method to mitigate these phenomena. First, based on the data division, an extended form of the w-test with its reliability measure is presented, and a theoretical reinterpretation of data snooping and IDS is provided. Then, to alleviate the effects of masking and swamping, a new outlier diagnostic method and its iterative form are proposed, namely data refining and iterative data refining (IDR). In general, if the total observations are initially divided into an inlying set and an outlying set, data snooping can be considered a process of selecting outliers from the inlying set to the outlying set. Conversely, data refining is then a reverse process to transfer inliers from the outlying set to the inlying one. Both theoretical analysis and practical examples show that IDR would keep stronger robustness than IDS due to the alleviation of masking and swamping effect, although it may pose a higher risk of precision loss when dealing with insufficient data.
中文翻译:
用于线性模型中异常值诊断的扩展 w 检验
异常值问题一直是大地测量学领域的研究热点。基于称为 w 检验的统计测试方法,数据窥探及其迭代形式迭代数据窥探 (IDS) 通常用于诊断线性模型中的异常值。然而,在存在多个异常值的情况下,可能会受到掩蔽和淹没效应的影响,从而限制检测和识别能力。这项贡献是为了调查掩蔽和淹没效应的原因,并提出一种新方法来减轻这些现象。首先,基于数据划分,提出了w检验的扩展形式及其可靠性度量,并对数据窥探和IDS进行了理论重新解释。然后,为了减轻掩蔽和淹没的影响,提出了一种新的异常值诊断方法及其迭代形式,即数据精炼和迭代数据精炼(IDR)。一般来说,如果最初将总观测值分为内部集和外部集,则数据窥探可以被认为是从内部集到外部集选择异常值的过程。相反,数据精炼是将内部值从外围集转移到内部集的逆过程。理论分析和实例都表明,由于掩蔽和沼泽效应的减轻,IDR比IDS保持更强的鲁棒性,尽管在处理数据不足时可能会带来更高的精度损失风险。