当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Inferring ancestry with the hierarchical soft clustering approach tangleGen
Genome Research ( IF 6.2 ) Pub Date : 2024-10-21 , DOI: 10.1101/gr.279399.124
Klara Elisabeth Burger, Solveig Klepper, Ulrike von Luxburg, Franz Baumdicker

Understanding the genetic ancestry of populations is central to numerous scientific and societal fields. It contributes to a better understanding of human evolutionary history, advances personalized medicine, aids in forensic identification, and allows individuals to connect to their genealogical roots. Existing methods, such as ADMIXTURE, have significantly improved our ability to infer ancestries. However, these methods typically work with a fixed number of independent ancestral populations. As a result, they provide insight into genetic admixture, but do not include a hierarchical interpretation. In particular, the intricate ancestral population structures remain difficult to unravel. Alternative methods with a consistent inheritance structure, such as hierarchical clustering, may offer benefits in terms of interpreting the inferred ancestries. Here, we present tangleGen, a soft clustering tool that transfers the hierarchical machine learning framework Tangles, which leverages graph theoretical concepts, to the field of population genetics. The hierarchical perspective of tangleGen on the composition and structure of populations improves the interpretability of the inferred ancestral relationships. Moreover, tangleGen adds a new layer of explainability, as it allows identifying the SNPs that are responsible for the clustering structure. We demonstrate the capabilities and benefits of tangleGen for the inference of ancestral relationships, using both simulated data and data from the 1000 Genomes Project.

中文翻译:


使用分层软聚类方法 tangleGen 推断祖先



了解种群的遗传祖先是许多科学和社会领域的核心。它有助于更好地了解人类进化历史,推进个性化医学,有助于法医鉴定,并允许个人连接到他们的家谱根源。现有的方法,如 ADMIXTURE,已经显著提高了我们推断祖先的能力。但是,这些方法通常适用于固定数量的独立祖先种群。因此,它们提供了对遗传混合的见解,但不包括分层解释。特别是,错综复杂的祖先种群结构仍然难以解开。具有一致继承结构的替代方法(例如分层聚类)可能在解释推断的祖先方面提供好处。在这里,我们介绍了 tangleGen,这是一种软聚类工具,它将利用图论概念的分层机器学习框架 Tangles 转移到群体遗传学领域。tangleGen 对种群组成和结构的分层视角提高了推断祖先关系的可解释性。此外,tangleGen 增加了一个新的可解释性层,因为它允许识别负责聚类结构的 SNP。我们使用模拟数据和来自 1000 Genomes Project 的数据,展示了 tangleGen 在推断祖先关系方面的能力和优势。
更新日期:2024-10-22
down
wechat
bug