The illusion of success: Test set disproportion causes inflated accuracy in remote sensing mapping research,International Journal of Applied Earth Observation and Geoinformation

当前位置： X-MOL 学术 › Int. J. Appl. Earth Obs. Geoinf. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The illusion of success: Test set disproportion causes inflated accuracy in remote sensing mapping research
International Journal of Applied Earth Observation and Geoinformation ( IF 7.6 ) Pub Date : 2024-11-16 , DOI: 10.1016/j.jag.2024.104256
Yuanjun Xiao, Zhen Zhao, Jingfeng Huang, Ran Huang, Wei Weng, Gerui Liang, Chang Zhou, Qi Shao, Qiyu Tian

In remote sensing mapping studies, selecting an appropriate test set to accurately evaluate the results is critical. An imprecise accuracy assessment can be misleading and fail to validate the applicability of mapping products. Commencing with the WHU-Hi-HanChuan dataset, this paper revealed the impact of sample size ratios in test sets on accuracy metrics by generating a series of test sets with varying ratios of positive and negative sample size to evaluate the same map. A rigorous approach for accuracy assessment was suggested, and an example of tea plantations mapping is used to demonstrate the process and analyse potential issues in traditional approaches. A scale factor (λ) was constructed to measure the discrepancy in sample size ratios between test sets and actual conditions. Accuracy adjustment formulas were developed and applied to adjust the accuracy of 42 previous maps based on the λ. Results showed a higher ratio of positive to negative sample size in test set led to inflated user’s accuracy (UA), F1-score (F1) and overall accuracy (OA), but had little impact on producer’s accuracy. When the ratio aligned with that in the target area, the UA, F1, and OA closely matched the true values, indicating the proportion of positive and negative samples in test set should be consistent with that in actual situation. The accuracies reported by the traditional approaches including test set sampling from labelled data and 5-fold cross validation were far from the true accuracy and could not reflect the performance of the map. Among 42 previous maps, nearly 60% of the maps had UAs overestimated by 10%, and 9.5% of the maps had UAs and F1s deviations of more than 25%. The conclusions of this study provide a clear caution for future mapping research and assist in producing and identifying truly excellent maps.

中文翻译：

成功的错觉：测试集不成比例导致遥感制图研究的准确性被夸大

在遥感制图研究中，选择合适的测试集来准确评估结果至关重要。不精确的精度评估可能会产生误导，并且无法验证制图产品的适用性。本文以 WHU-Hi-HanChuan 数据集为起点，通过生成一系列具有不同正负样本量比率的测试集来评估同一地图，揭示了测试集中样本量比率对准确性指标的影响。提出了一种严格的准确性评估方法，并使用茶园测绘示例来演示该过程并分析传统方法中的潜在问题。构建了一个比例因子（λ）来衡量测试集与实际情况之间样本量比率的差异。开发并应用精度调整公式，以基于 λ 调整之前 42 张地图的精度。结果表明，测试集中较高的正负样本量比率导致用户准确率（UA）、F1 分数（F1）和整体准确率（OA）被夸大，但对生产者准确率的影响很小。当比值与目标区域一致时，UA、F1 和 OA 与真实值接近，表明测试集中正负样本的比例应与实际情况一致。传统方法报告的准确性，包括从标记数据中抽样的测试集和 5 倍交叉验证，与真实的准确性相去甚远，无法反映地图的性能。在之前的 42 张地图中，近 60% 的地图的 UA 高估了 10%，9.5% 的地图的 UA 和 F1 偏差超过 25%。本研究的结论为未来的地图绘制研究提供了明确的警告，并有助于制作和识别真正优秀的地图。

更新日期：2024-11-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南