Why multiple hypothesis test corrections provide poor control of false positives in the real world.,Psychological Methods

当前位置： X-MOL 学术 › Psychological Methods › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Why multiple hypothesis test corrections provide poor control of false positives in the real world.
Psychological Methods ( IF 7.6 ) Pub Date : 2024-11-21 , DOI: 10.1037/met0000678
Stanley E Lazic

Most scientific disciplines use significance testing to draw conclusions about experimental or observational data. This classical approach provides a theoretical guarantee for controlling the number of false positives across a set of hypothesis tests, making it an appealing framework for scientists seeking to limit the number of false effects or associations that they claim to observe. Unfortunately, this theoretical guarantee applies to few experiments, and the true false positive rate (FPR) is much higher. Scientists have plenty of freedom to choose the error rate to control, the tests to include in the adjustment, and the method of correction, making strong error control difficult to attain. In addition, hypotheses are often tested after finding unexpected relationships or patterns, the data are analyzed in several ways, and analyses may be run repeatedly as data accumulate. As a result, adjusted p values are too small, incorrect conclusions are often reached, and results are harder to reproduce. In the following, I argue why the FPR is rarely controlled meaningfully and why shrinking parameter estimates is preferable to p value adjustments. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

中文翻译：

为什么多重假设检验校正在现实世界中对假阳性的控制不佳。

大多数科学学科使用显著性检验来得出有关实验或观察数据的结论。这种经典方法为控制一组假设检验中的假阳性数量提供了理论保证，使其成为寻求限制他们声称观察到的假效应或关联数量的科学家的有吸引力的框架。不幸的是，这种理论保证适用于少数实验，并且真假阳性率（FPR）要高得多。科学家们有很大的自由来选择要控制的误差率、要包含在调整中的测试以及纠正的方法，这使得强大的误差控制难以实现。此外，假设通常在发现意外的关系或模式后进行测试，以多种方式分析数据，并且随着数据的积累，可能会重复运行分析。因此，调整后的 p 值太小，经常得出不正确的结论，并且结果更难重现。在下文中，我将论证为什么 FPR 很少得到有意义的控制，以及为什么收缩参数估计比 p 值调整更可取。（PsycInfo 数据库记录（c） 2024 APA，保留所有权利）。

更新日期：2024-11-21

点击分享查看原文

点击收藏

阅读更多本刊新发论文