当前位置: X-MOL 学术Int. J. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reweighting UK Biobank corrects for pervasive selection bias due to volunteering
International Journal of Epidemiology ( IF 6.4 ) Pub Date : 2024-05-08 , DOI: 10.1093/ije/dyae054
Sjoerd van Alten 1, 2 , Benjamin W Domingue 3 , Jessica Faul 4 , Titus Galama 1, 2, 5 , Andries T Marees 1

Background Biobanks typically rely on volunteer-based sampling. This results in large samples (power) at the cost of representativeness (bias). The problem of volunteer bias is debated. Here, we (i) show that volunteering biases associations in UK Biobank (UKB) and (ii) estimate inverse probability (IP) weights that correct for volunteer bias in UKB. Methods Drawing on UK Census data, we constructed a subsample representative of UKB’s target population, which consists of all individuals invited to participate. Based on demographic variables shared between the UK Census and UKB, we estimated IP weights (IPWs) for each UKB participant. We compared 21 weighted and unweighted bivariate associations between these demographic variables to assess volunteer bias. Results Volunteer bias in all associations, as naively estimated in UKB, was substantial—in some cases so severe that unweighted estimates had the opposite sign of the association in the target population. For example, older individuals in UKB reported being in better health, in contrast to evidence from the UK Census. Using IPWs in weighted regressions reduced 87% of volunteer bias on average. Volunteer-based sampling reduced the effective sample size of UKB substantially, to 32% of its original size. Conclusions Estimates from large-scale biobanks may be misleading due to volunteer bias. We recommend IP weighting to correct for such bias. To aid in the construction of the next generation of biobanks, we provide suggestions on how to best ensure representativeness in a volunteer-based design. For UKB, IPWs have been made available.



背景 生物样本库通常依赖于基于志愿者的采样。这会导致大量样本(功效),但代价是代表性(偏差)。志愿者偏见问题存在争议。在这里,我们 (i) 显示英国生物银行 (UKB) 中的志愿偏见关联,以及 (ii) 估计纠正 UKB 中志愿者偏见的逆概率 (IP) 权重。方法 根据英国人口普查数据,我们构建了代表 UKB 目标人群的子样本,其中包括所有受邀参与的个人。根据英国人口普查和 UKB 之间共享的人口统计变量,我们估算了每个 UKB 参与者的 IP 权重 (IPW)。我们比较了这些人口统计变量之间的 21 个加权和未加权二元关联,以评估志愿者偏差。结果 正如 UKB 的天真估计,所有协会中的志愿者偏差都很大,在某些情况下非常严重,以至于未加权的估计与目标人群中的协会具有相反的迹象。例如,与英国人口普查的证据相比,英国央行的老年人报告健康状况更好。在加权回归中使用 IPW 平均减少了 87% 的志愿者偏见。基于志愿者的抽样大大减少了 UKB 的有效样本量,降至原始大小的 32%。结论 由于志愿者的偏见,大规模生物样本库的估计可能会产生误导。我们建议使用 IP 权重来纠正这种偏差。为了帮助建设下一代生物库,我们就如何最好地确保基于志愿者的设计的代表性提供建议。对于 UKB,IPW 已经可用。