当前位置: X-MOL 学术Nat. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models
Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2024-06-21 , DOI: 10.1038/s42256-024-00851-5
Evan E. Seitz , David M. McCandlish , Justin B. Kinney , Peter K. Koo

Deep neural networks (DNNs) have greatly advanced the ability to predict genome function from sequence. However, elucidating underlying biological mechanisms from genomic DNNs remains challenging. Existing interpretability methods, such as attribution maps, have their origins in non-biological machine learning applications and therefore have the potential to be improved by incorporating domain-specific interpretation strategies. Here we introduce SQUID (Surrogate Quantitative Interpretability for Deepnets), a genomic DNN interpretability framework based on domain-specific surrogate modelling. SQUID approximates genomic DNNs in user-specified regions of sequence space using surrogate models—simpler quantitative models that have inherently interpretable mathematical forms. SQUID leverages domain knowledge to model cis-regulatory mechanisms in genomic DNNs, in particular by removing the confounding effects that nonlinearities and heteroscedastic noise in functional genomics data can have on model interpretation. Benchmarking analysis on multiple genomic DNNs shows that SQUID, when compared to established interpretability methods, identifies motifs that are more consistent across genomic loci and yields improved single-nucleotide variant-effect predictions. SQUID also supports surrogate models that quantify epistatic interactions within and between cis-regulatory elements, as well as global explanations of cis-regulatory mechanisms across sequence contexts. SQUID thus advances the ability to mechanistically interpret genomic DNNs.



中文翻译:


使用代理模型解释基因组深层神经网络的顺式调控机制



深度神经网络 (DNN) 极大地提高了根据序列预测基因组功能的能力。然而,从基因组 DNN 中阐明潜在的生物学机制仍然具有挑战性。现有的可解释性方法(例如归因图)起源于非生物机器学习应用,因此有可能通过结合特定领域的解释策略来改进。在这里,我们介绍 SQUID(Deepnets 的代理定量可解释性),这是一种基于特定领域代理建模的基因组 DNN 可解释性框架。 SQUID 使用代理模型(具有本质上可解释的数学形式的更简单的定量模型)在序列空间的用户指定区域中近似基因组 DNN。 SQUID 利用领域知识对基因组 DNN 中的顺式调控机制进行建模,特别是通过消除功能基因组数据中的非线性和异方差噪声对模型解释可能产生的混杂影响。对多个基因组 DNN 的基准分析表明,与已建立的可解释性方法相比,SQUID 识别出在基因组位点上更加一致的基序,并产生改进的单核苷酸变异效应预测。 SQUID 还支持量化顺式调控元件内部和之间的上位相互作用的替代模型,以及跨序列上下文的顺式调控机制的全局解释。因此,SQUID 提高了机械解释基因组 DNN 的能力。

更新日期:2024-06-21
down
wechat
bug