当前位置: X-MOL 学术Genome Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SonicParanoid2: fast, accurate, and comprehensive orthology inference with machine learning and language models
Genome Biology ( IF 10.1 ) Pub Date : 2024-07-25 , DOI: 10.1186/s13059-024-03298-4
Salvatore Cosentino 1 , Sira Sriswasdi 2 , Wataru Iwasaki 1, 3, 4, 5, 6, 7

Accurate inference of orthologous genes constitutes a prerequisite for comparative and evolutionary genomics. SonicParanoid is one of the fastest tools for orthology inference; however, its scalability and accuracy have been hampered by time-consuming all-versus-all alignments and the existence of proteins with complex domain architectures. Here, we present a substantial update of SonicParanoid, where a gradient boosting predictor halves the execution time and a language model doubles the recall. Application to empirical large-scale and standardized benchmark datasets shows that SonicParanoid2 is much faster than comparable methods and also the most accurate. SonicParanoid2 is available at https://gitlab.com/salvo981/sonicparanoid2 and https://zenodo.org/doi/10.5281/zenodo.11371108 .



直系同源基因的准确推断是比较基因组学和进化基因组学的先决条件。 SonicParanoid 是最快的同源推理工具之一;然而,其可扩展性和准确性受到耗时的全对全比对以及具有复杂结构域结构的蛋白质的存在的阻碍。在这里,我们提出了 SonicParanoid 的重大更新,其中梯度增强预测器将执行时间减半,语言模型将召回率加倍。对经验性大规模和标准化基准数据集的应用表明,SonicParanoid2 比同类方法快得多,而且也是最准确的。 SonicParanoid2 可在 https://gitlab.com/salvo981/sonicparanoid2 和 https://zenodo.org/doi/10.5281/zenodo.11371108 获取。