当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accurate assembly of circular RNAs with TERRACE
Genome Research ( IF 6.2 ) Pub Date : 2024-07-26 , DOI: 10.1101/gr.279106.124
Tasfia Zahin, Qian Shi, Xiaofei Carl Zang, Mingfu Shao

Circular RNA (circRNA) is a class of RNA molecules that forms a closed loop with its 5' and 3' ends covalently bonded. circRNAs are known to be more stable than linear RNAs, admit distinct properties and functions, and have been proven to be promising biomarkers. Existing methods for assembling circRNAs heavily rely on the annotated transcriptomes, hence exhibiting unsatisfactory accuracy without a high-quality transcriptome. We present TERRACE, a new algorithm for full-length assembly of circRNAs from paired-end total RNA-seq data. TERRACE uses the splice graph as the underlying data structure that organizes the splicing and coverage information. We transform the problem of assembling circRNAs into finding paths that "bridge" the three fragments in the splice graph induced by back-spliced reads. We adopt a definition for optimal bridging paths and a dynamic programming algorithm to calculate such optimal paths. TERRACE features an efficient algorithm to detect back-spliced reads missed by RNA-seq aligners, contributing to its much improved sensitivity. It also incorporates a new machine-learning approach trained to assign a confidence score to each assembled circRNA, which is shown superior to using abundance for scoring. On both simulations and biological datasets TERRACE consistently outperforms existing methods by a large margin in sensitivity while maintaining better or comparable precision. In particular, when the annotations are not provided, TERRACE assembles 123%-413% more correct circRNAs than state-of-the-art methods. TERRACE presents a major leap on assembling full-length circRNAs from RNA-seq data, and we expect it to be widely used in the downstream research on circRNAs.

中文翻译:


使用 TERRACE 精确组装环状 RNA



环状RNA(circRNA)是一类RNA分子,其5'和3'端通过共价键结合形成闭环。已知 circRNA 比线性 RNA 更稳定,具有独特的特性和功能,并已被证明是有前途的生物标志物。现有的 circRNA 组装方法严重依赖注释的转录组,因此在没有高质量转录组的情况下,准确性不能令人满意。我们提出了 TERRACE,这是一种从双端总 RNA-seq 数据中全长组装 circRNA 的新算法。 TERRACE 使用拼接图作为组织拼接和覆盖信息的底层数据结构。我们将组装 circRNA 的问题转化为寻找“桥接”由反向剪接片段引起的剪接图中的三个片段的路径。我们采用最佳桥接路径的定义和动态规划算法来计算这种最佳路径。 TERRACE 采用高效算法来检测 RNA-seq 比对器遗漏的反向拼接读数,从而大大提高了灵敏度。它还采用了一种新的机器学习方法,经过训练可以为每个组装的 circRNA 分配置信度分数,这优于使用丰度进行评分。在模拟和生物数据集上,TERRACE 在灵敏度方面始终优于现有方法,同时保持更好或相当的精度。特别是,当未提供注释时,TERRACE 比最先进的方法组装的 circRNA 正确率高 123%-413%。 TERRACE 在从 RNA-seq 数据组装全长 circRNA 方面实现了重大飞跃,我们期望它将广泛应用于 circRNA 的下游研究。
更新日期:2024-07-26
down
wechat
bug