当前位置:
X-MOL 学术
›
Syst. Biol.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Deep Learning and Likelihood Approaches for Viral Phylogeography Converge on the Same Answers Whether the Inference Model Is Right or Wrong
Systematic Biology ( IF 6.1 ) Pub Date : 2024-01-08 , DOI: 10.1093/sysbio/syad074 Ammon Thompson 1 , Benjamin J Liebeskind 2 , Erik J Scully 2 , Michael J Landis 3
Systematic Biology ( IF 6.1 ) Pub Date : 2024-01-08 , DOI: 10.1093/sysbio/syad074 Ammon Thompson 1 , Benjamin J Liebeskind 2 , Erik J Scully 2 , Michael J Landis 3
Affiliation
Analysis of phylogenetic trees has become an essential tool in epidemiology. Likelihood-based methods fit models to phylogenies to draw inferences about the phylodynamics and history of viral transmission. However, these methods are often computationally expensive, which limits the complexity and realism of phylodynamic models and makes them ill-suited for informing policy decisions in real-time during rapidly developing outbreaks. Likelihood-free methods using deep learning are pushing the boundaries of inference beyond these constraints. In this paper, we extend, compare, and contrast a recently developed deep learning method for likelihood-free inference from trees. We trained multiple deep neural networks using phylogenies from simulated outbreaks that spread among 5 locations and found they achieve close to the same levels of accuracy as Bayesian inference under the true simulation model. We compared robustness to model misspecification of a trained neural network to that of a Bayesian method. We found that both models had comparable performance, converging on similar biases. We also implemented a method of uncertainty quantification called conformalized quantile regression that we demonstrate has similar patterns of sensitivity to model misspecification as Bayesian highest posterior density (HPD) and greatly overlap with HPDs, but have lower precision (more conservative). Finally, we trained and tested a neural network against phylogeographic data from a recent study of the SARS-Cov-2 pandemic in Europe and obtained similar estimates of region-specific epidemiological parameters and the location of the common ancestor in Europe. Along with being as accurate and robust as likelihood-based methods, our trained neural networks are on average over 3 orders of magnitude faster after training. Our results support the notion that neural networks can be trained with simulated data to accurately mimic the good and bad statistical properties of the likelihood functions of generative phylogenetic models.
中文翻译:
病毒系统地理学的深度学习和可能性方法都集中在相同的答案上,无论推理模型是对还是错
系统发育树的分析已成为流行病学中必不可少的工具。基于可能性的方法将模型与系统发育进行拟合,以得出有关系统动力学和病毒传播历史的推断。然而,这些方法通常计算成本高昂,这限制了系统动力学模型的复杂性和现实性,并使它们不适合在快速发展的疫情期间实时为政策决策提供信息。使用深度学习的无似然方法正在推动推理的界限超越这些限制。在本文中,我们扩展、比较和对比了最近开发的一种深度学习方法,用于从树中进行无似然推理。我们使用来自分布在 5 个位置的模拟爆发的系统发育来训练多个深度神经网络,发现它们在真实模拟模型下达到与贝叶斯推理接近相同水平的准确性。我们将经过训练的神经网络的鲁棒性与模型错误指定的鲁棒性与贝叶斯方法进行了比较。我们发现两种模型的性能相似,但偏倚相似。我们还实施了一种称为共形分位数回归的不确定性量化方法,我们证明该方法与贝叶斯最高后验密度 (HPD) 具有相似的对模型错误指定的敏感性模式,并且与 HPD 重叠很大,但精度较低(更保守)。最后,我们根据最近一项关于欧洲 SARS-Cov-2 大流行的研究的系统地理数据训练和测试了一个神经网络,并获得了对特定地区流行病学参数和欧洲共同祖先位置的类似估计。 除了与基于似然的方法一样准确和稳健外,我们经过训练的神经网络在训练后的平均速度也快了 3 个数量级以上。我们的结果支持这样一种观点,即神经网络可以用模拟数据进行训练,以准确模拟生成系统发育模型似然函数的好坏统计特性。
更新日期:2024-01-08
中文翻译:
病毒系统地理学的深度学习和可能性方法都集中在相同的答案上,无论推理模型是对还是错
系统发育树的分析已成为流行病学中必不可少的工具。基于可能性的方法将模型与系统发育进行拟合,以得出有关系统动力学和病毒传播历史的推断。然而,这些方法通常计算成本高昂,这限制了系统动力学模型的复杂性和现实性,并使它们不适合在快速发展的疫情期间实时为政策决策提供信息。使用深度学习的无似然方法正在推动推理的界限超越这些限制。在本文中,我们扩展、比较和对比了最近开发的一种深度学习方法,用于从树中进行无似然推理。我们使用来自分布在 5 个位置的模拟爆发的系统发育来训练多个深度神经网络,发现它们在真实模拟模型下达到与贝叶斯推理接近相同水平的准确性。我们将经过训练的神经网络的鲁棒性与模型错误指定的鲁棒性与贝叶斯方法进行了比较。我们发现两种模型的性能相似,但偏倚相似。我们还实施了一种称为共形分位数回归的不确定性量化方法,我们证明该方法与贝叶斯最高后验密度 (HPD) 具有相似的对模型错误指定的敏感性模式,并且与 HPD 重叠很大,但精度较低(更保守)。最后,我们根据最近一项关于欧洲 SARS-Cov-2 大流行的研究的系统地理数据训练和测试了一个神经网络,并获得了对特定地区流行病学参数和欧洲共同祖先位置的类似估计。 除了与基于似然的方法一样准确和稳健外,我们经过训练的神经网络在训练后的平均速度也快了 3 个数量级以上。我们的结果支持这样一种观点,即神经网络可以用模拟数据进行训练,以准确模拟生成系统发育模型似然函数的好坏统计特性。