Nature Human Behaviour ( IF 21.4 ) Pub Date : 2024-07-04 , DOI: 10.1038/s41562-024-01909-5 Caitlin E Carey 1, 2, 3 , Rebecca Shafee 1, 4, 5 , Robbee Wedow 1, 2, 6, 7, 8, 9, 10 , Amanda Elliott 1, 2, 11 , Duncan S Palmer 1, 2, 12, 13, 14 , John Compitello 1, 2, 12 , Masahiro Kanai 1, 2, 12 , Liam Abbott 1, 2 , Patrick Schultz 1, 2, 12 , Konrad J Karczewski 1, 2 , Samuel C Bryant 1, 2 , Caroline M Cusick 1 , Claire Churchhouse 1, 2, 12 , Daniel P Howrigan 1, 2 , Daniel King 1, 2, 12 , George Davey Smith 12, 15, 16 , Benjamin M Neale 1, 2, 3, 12 , Raymond K Walters 1, 2, 11 , Elise B Robinson 1, 2, 3
Data within biobanks capture broad yet detailed indices of human variation, but biobank-wide insights can be difficult to extract due to complexity and scale. Here, using large-scale factor analysis, we distill hundreds of variables (diagnoses, assessments and survey items) into 35 latent constructs, using data from unrelated individuals with predominantly estimated European genetic ancestry in UK Biobank. These factors recapitulate known disease classifications, disentangle elements of socioeconomic status, highlight the relevance of psychiatric constructs to health and improve measurement of pro-health behaviours. We go on to demonstrate the power of this approach to clarify genetic signal, enhance discovery and identify associations between underlying phenotypic structure and health outcomes. In building a deeper understanding of ways in which constructs such as socioeconomic status, trauma, or physical activity are structured in the dataset, we emphasize the importance of considering the interwoven nature of the human phenome when evaluating public health patterns.
中文翻译:
英国生物银行表型数据的原则性蒸馏揭示了人类变异的潜在结构
生物样本库内的数据捕获了广泛而详细的人类变异指标,但由于复杂性和规模,生物样本库范围内的见解可能很难提取。在这里,我们使用大规模因子分析,将数百个变量(诊断、评估和调查项目)提炼成 35 个潜在构建体,使用的数据来自英国生物银行中主要估计欧洲遗传血统的无关个体。这些因素概括了已知的疾病分类,理清了社会经济地位的要素,强调了精神病学结构与健康的相关性,并改善了对健康行为的衡量。我们继续证明这种方法在澄清遗传信号、增强发现以及识别潜在表型结构与健康结果之间的关联方面的力量。为了更深入地了解数据集中社会经济地位、创伤或身体活动等结构的构建方式,我们强调在评估公共卫生模式时考虑人类现象的相互交织性质的重要性。