当前位置:
X-MOL 学术
›
Syst. Biol.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Inferring Taxonomic Affinities and Genetic Distances Using Morphological Features Extracted from Specimen Images: a Case Study with a Bivalve dataset
Systematic Biology ( IF 6.1 ) Pub Date : 2024-07-22 , DOI: 10.1093/sysbio/syae042 Martin Hofmann 1 , Steffen Kiel 2 , Lara M Kösters 3 , Jana Wäldchen 3, 4 , Patrick Mäder 1, 4, 5
Systematic Biology ( IF 6.1 ) Pub Date : 2024-07-22 , DOI: 10.1093/sysbio/syae042 Martin Hofmann 1 , Steffen Kiel 2 , Lara M Kösters 3 , Jana Wäldchen 3, 4 , Patrick Mäder 1, 4, 5
Affiliation
Reconstructing the tree of life and understanding the relationships of taxa are core questions in evolutionary and systematic biology. The main advances in this field in the last decades were derived from molecular phylogenetics; however, for most species, molecular data are not available. Here, we explore the applicability of two deep learning methods – supervised classification approaches and unsupervised similarity learning – to infer organism relationships from specimen images. As a basis, we assembled an image dataset covering 4144 bivalve species belonging to 74 families across all orders and subclasses of the extant Bivalvia, with molecular phylogenetic data being available for all families and a complete taxonomic hierarchy for all species. The suitability of this dataset for deep learning experiments was evidenced by an ablation study resulting in almost 80% accuracy for identifications on the species level. Three sets of experiments were performed using our dataset. First, we included taxonomic hierarchy and genetic distances in a supervised learning approach to obtain predictions on several taxonomic levels simultaneously. Here, we stimulated the model to consider features shared between closely related taxa to be more critical for their classification than features shared with distantly related taxa, imprinting phylogenetic and taxonomic affinities into the architecture and training procedure. Second, we used transfer learning and similarity learning approaches for zero-shot experiments to identify the higher-level taxonomic affinities of test species that the models had not been trained on. The models assigned the unknown species to their respective genera with approximately 48% and 67% accuracy. Lastly, we used unsupervised similarity learning to infer the relatedness of the images without prior knowledge of their taxonomic or phylogenetic affinities. The results clearly showed similarities between visual appearance and genetic relationships at the higher taxonomic levels. The correlation was 0.6 for the most species-rich subclass (Imparidentia), ranging from 0.5 to 0.7 for the orders with the most images. Overall, the correlation between visual similarity and genetic distances at the family level was 0.78. However, fine-grained reconstructions based on these observed correlations, such as sister-taxa relationships, require further work. Overall, our results broaden the applicability of automated taxon identification systems and provide a new avenue for estimating phylogenetic relationships from specimen images.
中文翻译:
使用从标本图像中提取的形态学特征推断分类学亲和力和遗传距离:双壳类数据集的案例研究
重建生命之树和理解分类群之间的关系是进化和系统生物学的核心问题。过去几十年该领域的主要进展来自分子系统发育学;然而,对于大多数物种,分子数据不可用。在这里,我们探讨了两种深度学习方法——有序分类方法和无监督相似性学习——从标本图像中推断生物关系的适用性。作为基础,我们组装了一个图像数据集,涵盖了现存 Bivalvia 所有目和亚类的 74 个科的 4144 个双壳类物种,所有科的分子系统发育数据都可用,所有物种都有完整的分类层次结构。消融研究证明了该数据集适用于深度学习实验,该研究导致物种水平鉴定的准确率接近 80%。使用我们的数据集进行了三组实验。首先,我们在监督学习方法中包括分类层次结构和遗传距离,以同时获得多个分类水平的预测。在这里,我们刺激模型认为密切相关的分类群之间共享的特征比与远缘分类群共享的特征对其分类更重要,将系统发育和分类学亲和力印记到架构和训练过程中。其次,我们使用迁移学习和相似性学习方法进行零样本实验,以确定模型尚未训练的测试物种的更高级别分类亲和力。模型将未知物种分配到各自的属中,准确率约为 48% 和 67%。 最后,我们使用无监督相似性学习来推断图像的相关性,而无需事先了解它们的分类学或系统发育亲和力。结果清楚地表明,在较高的分类水平上,视觉外观和遗传关系之间存在相似性。物种最丰富的亚类 (Imparidentia) 的相关性为 0.6,图像最多的目相关性为 0.5 到 0.7。总体而言,家庭水平视觉相似性与遗传距离之间的相关性为 0.78。然而,基于这些观察到的相关性(例如姐妹类群关系)的精细重建需要进一步的工作。总体而言,我们的结果拓宽了自动分类单元识别系统的适用性,并为从标本图像估计系统发育关系提供了一条新途径。
更新日期:2024-07-22
中文翻译:
使用从标本图像中提取的形态学特征推断分类学亲和力和遗传距离:双壳类数据集的案例研究
重建生命之树和理解分类群之间的关系是进化和系统生物学的核心问题。过去几十年该领域的主要进展来自分子系统发育学;然而,对于大多数物种,分子数据不可用。在这里,我们探讨了两种深度学习方法——有序分类方法和无监督相似性学习——从标本图像中推断生物关系的适用性。作为基础,我们组装了一个图像数据集,涵盖了现存 Bivalvia 所有目和亚类的 74 个科的 4144 个双壳类物种,所有科的分子系统发育数据都可用,所有物种都有完整的分类层次结构。消融研究证明了该数据集适用于深度学习实验,该研究导致物种水平鉴定的准确率接近 80%。使用我们的数据集进行了三组实验。首先,我们在监督学习方法中包括分类层次结构和遗传距离,以同时获得多个分类水平的预测。在这里,我们刺激模型认为密切相关的分类群之间共享的特征比与远缘分类群共享的特征对其分类更重要,将系统发育和分类学亲和力印记到架构和训练过程中。其次,我们使用迁移学习和相似性学习方法进行零样本实验,以确定模型尚未训练的测试物种的更高级别分类亲和力。模型将未知物种分配到各自的属中,准确率约为 48% 和 67%。 最后,我们使用无监督相似性学习来推断图像的相关性,而无需事先了解它们的分类学或系统发育亲和力。结果清楚地表明,在较高的分类水平上,视觉外观和遗传关系之间存在相似性。物种最丰富的亚类 (Imparidentia) 的相关性为 0.6,图像最多的目相关性为 0.5 到 0.7。总体而言,家庭水平视觉相似性与遗传距离之间的相关性为 0.78。然而,基于这些观察到的相关性(例如姐妹类群关系)的精细重建需要进一步的工作。总体而言,我们的结果拓宽了自动分类单元识别系统的适用性,并为从标本图像估计系统发育关系提供了一条新途径。