Molecular Therapy - Nucleic Acids ( IF 6.5 ) Pub Date : 2021-02-18 , DOI: 10.1016/j.omtn.2021.02.014 Siguo Wang , Qinhu Zhang , Zhen shen , Ying He , Zhen-Heng Chen , Jianqiang Li , De-Shuang Huang
The study of transcriptional regulation is still difficult yet fundamental in molecular biology research. Recent research has shown that the double helix structure of nucleotides plays an important role in improving the accuracy and interpretability of transcription factor binding sites (TFBSs). Although several computational methods have been designed to take both DNA sequence and DNA shape features into consideration simultaneously, how to design an efficient model is still an intractable topic. In this paper, we proposed a hybrid convolutional recurrent neural network (CNN/RNN) architecture, CRPTS, to predict TFBSs by combining DNA sequence and DNA shape features. The novelty of our proposed method relies on three critical aspects: (1) the application of a shared hybrid CNN and RNN has the ability to efficiently extract features from large-scale genomic sequences obtained by high-throughput technology; (2) the common patterns were found from DNA sequences and their corresponding DNA shape features; (3) our proposed CRPTS can capture local structural information of DNA sequences without completely relying on DNA shape data. A series of comprehensive experiments on 66 in vitro datasets derived from universal protein binding microarrays (uPBMs) shows that our proposed method CRPTS obviously outperforms the state-of-the-art methods.
中文翻译:
基于共享混合深度学习架构的DNA形状特征预测转录因子结合位点
转录调控的研究仍然很困难,但在分子生物学研究中却是基础。最近的研究表明,核苷酸的双螺旋结构在提高转录因子结合位点(TFBS)的准确性和可解释性方面起着重要作用。尽管已经设计了几种计算方法来同时考虑DNA序列和DNA形状特征,但是如何设计有效的模型仍然是一个棘手的话题。在本文中,我们提出了一种混合卷积递归神经网络(CNN / RNN)体系结构CRPTS,通过结合DNA序列和DNA形状特征来预测TFBS。我们提出的方法的新颖性取决于三个关键方面:(1)共享的CNN和RNN混合应用程序具有从高通量技术获得的大规模基因组序列中高效提取特征的能力; (2)从DNA序列及其对应的DNA形状特征中发现了共同的模式;(3)我们提出的CRPTS可以捕获DNA序列的局部结构信息,而无需完全依赖于DNA形状数据。对66进行的一系列综合实验源自通用蛋白结合微阵列(uPBM)的体外数据集显示,我们提出的方法CRPTS明显优于最新方法。