当前位置: X-MOL 学术ChemRxiv › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cost-Efficient Evaluation of Molecular Generative Models' Generalizability in de. novo. Drug Design via a Double-Parameter Mathematical Framework
ChemRxiv Pub Date : 2025-01-02 , DOI: 10.26434/chemrxiv-2025-n0qpb
PENG, GAO, Jie, Zhang, Zhilian, Dai, Yangyang, Deng

Deep generative models are increasingly crucial in de. novo. drug design, excelling in the rapid exploration of vast chemical spaces for advancing molecular design. By integrating techniques such as adversarial networks, reinforcement learning, and transfer learning, these models can effectively leverage both the public datasets and the collected experimental data. However, accurately assessing the generalization ability of these models for drug discovery applications, while minimizing costs, still remains to be challenging. Developing an accurate yet cost-effective solution would provide substantial benefits to both academia and industry. In this study, we propose three accuracy-based methods for predicting the theoretical coverage of generative models, classifying them into three cost levels: low, medium, and high. For all these methods, the derivative of a unique curve with respect to sample iterations can serve as the most cost-effective yet reliable metric for evaluating generalization ability. To address sampling non-uniformity, we propose a novel double-parameter mathematical model that can accurately fit for both experimental and theoretical coverage across various generative architectures. Furthermore, the developed model provides qualitative insights into how transfer learning and reinforcement learning influence generative models' performance by examining changes resulting from increased non-uniformity and enhanced probabilities of sampling target molecules.

中文翻译:


在 de.诺。通过双参数数学框架进行药物设计



深度生成模型在 de.诺。药物设计,擅长快速探索广阔的化学空间以推进分子设计。通过集成对抗网络、强化学习和迁移学习等技术,这些模型可以有效地利用公共数据集和收集的实验数据。然而,准确评估这些模型在药物发现应用中的泛化能力,同时最大限度地降低成本,仍然具有挑战性。开发准确且具有成本效益的解决方案将为学术界和工业界带来巨大的好处。在这项研究中,我们提出了三种基于准确性的方法来预测生成模型的理论覆盖率,将它们分为三个成本水平:低、中和高。对于所有这些方法,关于样本迭代的唯一曲线的导数可以作为评估泛化能力的最经济高效但最可靠的指标。为了解决采样不均匀性问题,我们提出了一种新的双参数数学模型,该模型可以准确地拟合各种生成架构的实验和理论覆盖。此外,开发的模型通过检查采样目标分子的不均匀性增加和概率增加所导致的变化,为迁移学习和强化学习如何影响生成模型的性能提供了定性见解。
更新日期:2025-01-02
down
wechat
bug