Enabling Data-Driven Solubility Modeling at GSK: Enhancing Purge Predictions for Mutagenic Impurities,Organic Process Research & Development

当前位置： X-MOL 学术 › Org. Process Res. Dev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enabling Data-Driven Solubility Modeling at GSK: Enhancing Purge Predictions for Mutagenic Impurities
Organic Process Research & Development ( IF 3.1 ) Pub Date : 2024-11-11 , DOI: 10.1021/acs.oprd.4c00384
Luigi Da Vià, Matthias Depoortere, Robert D. Willacy, Alastair J. Roberts, Pandian Sokkar, Mathieu Fossépré, Andrew Ruba, Magdalena A. Zwierzyna

In the pharmaceutical industry, solubility is a critical parameter influencing various stages of drug development, from early discovery to commercial manufacturing. This work showcases a high-throughput solubility screening workflow and describes the steps required to standardize and curate data suitably to allow automated data flow. Using the high-quality data, we developed a quantitative structure–property relationship model using gradient boosting and molecular descriptors, requiring only a 2D molecular structure to generate predictions. The accuracy of the model is competitive with alternative approaches where additional physical data is not required. A key use case for solubility predictions made in this way is in developing control strategies for mutagenic impurities, allowing for a data-driven and consistent method for calculating the solubility contribution to purge calculations. Further perspective is given on the future of the application of the model as a solubility prediction algorithm and on the approach to data-driven methodologies supporting drug development in general, highlighting the potential for federated learning approaches which use technological approaches to overcome the barrier to cross-industry data sharing.

中文翻译：

在 GSK 实现数据驱动的溶解度建模：增强致突变杂质的吹扫预测

在制药行业，溶解度是影响药物开发各个阶段（从早期发现到商业生产）的关键参数。这项工作展示了一种高通量溶解度筛选工作流程，并描述了适当标准化和整理数据以实现自动化数据流所需的步骤。利用高质量的数据，我们开发了一个使用梯度提升和分子描述符的定量结构-性能关系模型，只需要 2D 分子结构即可生成预测。该模型的准确性与不需要额外物理数据的替代方法相比具有竞争力。以这种方式进行溶解度预测的一个关键用例是开发诱变杂质的控制策略，从而允许使用数据驱动的一致方法来计算溶解度对吹扫计算的贡献。进一步展望了该模型作为溶解度预测算法的未来应用以及支持一般药物开发的数据驱动方法的方法，强调了使用技术方法克服跨行业数据共享障碍的联合学习方法的潜力。

更新日期：2024-11-12

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南