Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2024-12-09 , DOI: 10.1038/s42256-024-00946-z Haopeng Yu, Heng Yang, Wenqing Sun, Zongyun Yan, Xiaofei Yang, Huakun Zhang, Yiliang Ding, Ke Li
The complex ‘language’ of plant RNA encodes a vast array of biological regulatory elements that orchestrate crucial aspects of plant growth, development and adaptation to environmental stresses. Recent advancements in foundation models (FMs) have demonstrated their unprecedented potential to decipher complex ‘language’ in biology. In this study, we introduced PlantRNA-FM, a high-performance and interpretable RNA FM specifically designed for plants. PlantRNA-FM was pretrained on an extensive dataset, integrating RNA sequences and RNA structure information from 1,124 distinct plant species. PlantRNA-FM exhibits superior performance in plant-specific downstream tasks. PlantRNA-FM achieves an F1 score of 0.974 for genic region annotation, whereas the current best-performing model achieves 0.639. Our PlantRNA-FM is empowered by our interpretable framework that facilitates the identification of biologically functional RNA sequence and structure motifs, including both RNA secondary and tertiary structure motifs across transcriptomes. Through experimental validations, we revealed translation-associated RNA motifs in plants. Our PlantRNA-FM also highlighted the importance of the position information of these functional RNA motifs in genic regions. Taken together, our PlantRNA-FM facilitates the exploration of functional RNA motifs across the complexity of transcriptomes, empowering plant scientists with capabilities for programming RNA codes in plants.
中文翻译:
用于探索植物中功能性 RNA 基序的可解释 RNA 基础模型
植物 RNA 的复杂“语言”编码了大量的生物调节元件,这些元件协调植物生长、发育和适应环境压力的关键方面。基础模型 (FM) 的最新进展证明了它们在破译生物学中复杂“语言”方面前所未有的潜力。在这项研究中,我们介绍了 PlantRNA-FM,一种专为植物设计的高性能且可解释的 RNA FM。PlantRNA-FM 在广泛的数据集上进行了预训练,整合了来自 1,124 种不同植物物种的 RNA 序列和 RNA 结构信息。PlantRNA-FM 在植物特异性下游任务中表现出卓越的性能。PlantRNA-FM 的基因区域注释 F1 评分为 0.974,而目前表现最好的模型达到 0.639。我们的 PlantRNA-FM 由我们的可解释框架提供支持,有助于识别具有生物学功能的 RNA 序列和结构基序,包括跨转录组的 RNA 二级和三级结构基序。通过实验验证,我们揭示了植物中翻译相关的 RNA 基序。我们的 PlantRNA-FM 还强调了这些功能 RNA 基序在基因区域的位置信息的重要性。总之,我们的 PlantRNA-FM 有助于在转录组的复杂性中探索功能性 RNA 基序,使植物科学家能够对植物中的 RNA 密码进行编程。