Definition of metafounders based on population structure analysis,Genetics Selection Evolution

当前位置： X-MOL 学术 › Genet. Sel. Evol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Definition of metafounders based on population structure analysis
Genetics Selection Evolution ( IF 3.6 ) Pub Date : 2024-06-06 , DOI: 10.1186/s12711-024-00913-7
Christine Anglhuber _{1,

2} , Christian Edel ₁ , Eduardo C G Pimentel ₁ , Reiner Emmerling ₁ , Kay-Uwe Götz ₁ , Georg Thaller ₂

Affiliation

Limitations of the concept of identity by descent in the presence of stratification within a breeding population may lead to an incomplete formulation of the conventional numerator relationship matrix ( $$\mathbf{A}$$ ). Combining $$\mathbf{A}$$ with the genomic relationship matrix ( $$\mathbf{G}$$ ) in a single-step approach for genetic evaluation may cause inconsistencies that can be a source of bias in the resulting predictions. The objective of this study was to identify stratification using genomic data and to transfer this information to matrix $$\mathbf{A}$$ , to improve the compatibility of $$\mathbf{A}$$ and $$\mathbf{G}$$ . Using software to detect population stratification (ADMIXTURE), we developed an iterative approach. First, we identified 2 to 40 strata ( $$k$$ ) with ADMIXTURE, which we then introduced in a stepwise manner into matrix $$\mathbf{A}$$ , to generate matrix $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ using the metafounder methodology. Improvements in consistency between matrix $$\mathbf{G}$$ and $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ were evaluated by regression analysis and through the comparison of the overall mean and mean diagonal values of both matrices. The approach was tested on genotype and pedigree information of European and North American Brown Swiss animals (85,249). Analyses with ADMIXTURE were initially performed on the full set of genotypes (S1). In addition, we used an alternative dataset where we avoided sampling of closely related animals (S2). Results of the regression analyses of standard $$\mathbf{A}$$ on $$\mathbf{G}$$ were – 0.489, 0.780 and 0.647 for intercept, slope and fit of the regression. When analysing S1 data results of the regression for $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ on $$\mathbf{G}$$ corresponding values were – 0.028, 1.087 and 0.807 for $$k$$ =7, while there was no clear optimum $$k$$ . Analyses of S2 gave a clear optimal $$k$$ =24, with − 0.020, 0.998 and 0.817 as results of the regression. For this $$k$$ differences in mean and mean diagonal values between both matrices were negligible. The derivation of hidden stratification information based on genotyped animals and its integration into $$\mathbf{A}$$ improved compatibility of the resulting $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ and $$\mathbf{G}$$ considerably compared to the initial situation. In dairy breeding populations with large half-sib families as sub-structures it is necessary to balance the data when applying population structure analysis to obtain meaningful results.

中文翻译：

基于种群结构分析的 metafounders 定义

在育种种群中存在分层的情况下，血统身份概念的限制可能导致常规分子关系矩阵（ $$\mathbf{A}$$ ）的表述不完整。将 $$\mathbf{A}$$ 与基因组关系矩阵（ $$\mathbf{G}$$ ）以单步方法进行遗传评估可能会导致不一致，这可能是结果预测中偏差的来源。本研究的目的是使用基因组数据识别分层，并将此信息传输到矩阵 $$\mathbf{A}$$ ，以提高 $$\mathbf{A}$$ 和 $$\mathbf{G}$$ 的兼容性。使用软件检测群体分层（ADMIXTURE），我们开发了一种迭代方法。首先，我们用 ADMIXTURE 确定了 2 到 40 个地层（ $$k$$ ），然后我们逐步将其引入矩阵 $$\mathbf{A}$$ ，以使用 metafounder 方法生成矩阵 $${\mathbf{A}}^{{\varvec{\Gamma}}}$$。通过回归分析以及通过比较两个矩阵的总体平均值和平均对角线值来评估矩阵 $$\mathbf{G}$$ 和 $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ 之间一致性的改善。该方法在欧洲和北美棕色瑞士动物的基因型和系谱信息（85,249）上进行了测试。最初对全套基因型（S1）进行 ADMIX 分析。此外，我们使用了一个替代数据集，避免了对密切相关动物（S2）的采样。标准 $$\mathbf{A}$$ 对 $$\mathbf{G}$$ 的回归分析结果是 – 0.489、0.780 和 0.647 的截距、斜率和拟合度。当分析 $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ 对 $$\mathbf{G}$$ 的回归的 S1 数据结果为 – 0.028、1.087 和 0.807（对于 $$k$ =7），而没有明确的最佳 $$k$$ 。对 S2 的分析给出了明显的最优 $$k$$ =24，回归的结果为 -0.020、0.998 和 0.817。对于这个 $$k$$，两个矩阵之间平均值和平均值对角线值的差异可以忽略不计。与初始情况相比，基于基因型动物的隐藏分层信息的推导及其与 $$\mathbf{A}$$ 的整合大大提高了生成的 $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ 和 $$\mathbf{G}$$ 的兼容性。在以大型半同胞家庭作为子结构的奶牛育种种群中，在应用群体结构分析时有必要平衡数据以获得有意义的结果。

更新日期：2024-06-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南