当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Inferring Taxonomic Affinities and Genetic Distances Using Morphological Features Extracted from Specimen Images: a Case Study with a Bivalve dataset
Systematic Biology ( IF 6.1 ) Pub Date : 2024-07-22 , DOI: 10.1093/sysbio/syae042
Martin Hofmann 1 , Steffen Kiel 2 , Lara M Kösters 3 , Jana Wäldchen 3, 4 , Patrick Mäder 1, 4, 5
Affiliation  

Reconstructing the tree of life and understanding the relationships of taxa are core questions in evolutionary and systematic biology. The main advances in this field in the last decades were derived from molecular phylogenetics; however, for most species, molecular data are not available. Here, we explore the applicability of two deep learning methods – supervised classification approaches and unsupervised similarity learning – to infer organism relationships from specimen images. As a basis, we assembled an image dataset covering 4144 bivalve species belonging to 74 families across all orders and subclasses of the extant Bivalvia, with molecular phylogenetic data being available for all families and a complete taxonomic hierarchy for all species. The suitability of this dataset for deep learning experiments was evidenced by an ablation study resulting in almost 80% accuracy for identifications on the species level. Three sets of experiments were performed using our dataset. First, we included taxonomic hierarchy and genetic distances in a supervised learning approach to obtain predictions on several taxonomic levels simultaneously. Here, we stimulated the model to consider features shared between closely related taxa to be more critical for their classification than features shared with distantly related taxa, imprinting phylogenetic and taxonomic affinities into the architecture and training procedure. Second, we used transfer learning and similarity learning approaches for zero-shot experiments to identify the higher-level taxonomic affinities of test species that the models had not been trained on. The models assigned the unknown species to their respective genera with approximately 48% and 67% accuracy. Lastly, we used unsupervised similarity learning to infer the relatedness of the images without prior knowledge of their taxonomic or phylogenetic affinities. The results clearly showed similarities between visual appearance and genetic relationships at the higher taxonomic levels. The correlation was 0.6 for the most species-rich subclass (Imparidentia), ranging from 0.5 to 0.7 for the orders with the most images. Overall, the correlation between visual similarity and genetic distances at the family level was 0.78. However, fine-grained reconstructions based on these observed correlations, such as sister-taxa relationships, require further work. Overall, our results broaden the applicability of automated taxon identification systems and provide a new avenue for estimating phylogenetic relationships from specimen images.

中文翻译:


使用从标本图像中提取的形态特征推断分类亲和力和遗传距离:双壳类数据集的案例研究



重建生命树和理解类群之间的关系是进化和系统生物学的核心问题。过去几十年该领域的主要进展源自分子系统发育学。然而,对于大多数物种,没有分子数据。在这里,我们探索两种深度学习方法(监督分类方法和无监督相似性学习)的适用性,以从样本图像推断生物体关系。作为基础,我们组装了一个图像数据集,涵盖现存双壳纲所有目和亚纲的 74 个科的 4144 个双壳类物种,并提供所有科的分子系统发育数据和所有物种的完整分类层次结构。一项消融研究证明了该数据集适用于深度学习实验,该研究在物种层面上的识别准确率接近 80%。使用我们的数据集进行了三组实验。首先,我们将分类层次结构和遗传距离纳入监督学习方法中,以同时获得多个分类级别的预测。在这里,我们刺激模型考虑密切相关的类群之间共享的特征对于其分类比远亲相关的类群共享的特征更重要,从而将系统发育和分类学亲和力印记到体系结构和训练过程中。其次,我们使用迁移学习和相似性学习方法进行零样本实验,以确定模型未经训练的测试物种的更高级别的分类亲和力。模型将未知物种分配到各自的属,准确度约为 48% 和 67%。 最后,我们使用无监督相似性学习来推断图像的相关性,而无需事先了解图像的分类学或系统发育亲和力。结果清楚地显示了较高分类水平上视觉外观和遗传关系之间的相似性。对于物种最丰富的亚类(Imparidentia),相关性为 0.6,对于图像最多的目,相关性范围为 0.5 到 0.7。总体而言,视觉相似性与家族水平上的遗传距离之间的相关性为0.78。然而,基于这些观察到的相关性(例如姐妹类群关系)的细粒度重建需要进一步的工作。总体而言,我们的结果拓宽了自动分类单元识别系统的适用性,并为根据标本图像估计系统发育关系提供了新途径。
更新日期:2024-07-22
down
wechat
bug