当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
kMoL: an open-source machine and federated learning library for drug discovery
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2025-02-25 , DOI: 10.1186/s13321-025-00967-9
Romeo Cozac 1 , Haris Hasic 1 , Jun Jin Choong 1 , Vincent Richard 1 , Loic Beheshti 1 , Cyrille Froehlich 1 , Takuto Koyama 2 , Shigeyuki Matsumoto 2 , Ryosuke Kojima 2 , Hiroaki Iwata 2 , Aki Hasegawa 2 , Takao Otsuka 2 , Yasushi Okuno 2
Affiliation  

Machine learning is quickly becoming integral to drug discovery pipelines, particularly quantitative structure-activity relationship (QSAR) and absorption, distribution, metabolism, and excretion (ADME) tasks. Graph Convolutional Network (GCN) models have proven especially promising due to their inherent ability to model molecular structures using graph-based representations. However, maximizing the potential of such models in practice is challenging, as companies prioritize data privacy and security over collaboration initiatives to improve model performance and robustness. kMoL is an open-source machine learning library with integrated federated learning capabilities developed to address such challenges. Its key features include state-of-the-art model architectures, Bayesian optimization, explainability, and federated learning mechanisms. It demonstrates extensive customization possibilities, advanced security features, straightforward implementation of user-specific models, and high adaptability to custom datasets without additional programming requirements. kMoL is evaluated through locally trained benchmark settings and distributed federated learning experiments using various datasets to assess the features and flexibility of the library, as well as the ability to facilitate fast and practical experimentation. Additionally, results of these experiments provide further insights into the performance trade-offs associated with federated learning strategies, presenting valuable guidance for deploying machine learning models in a privacy-preserving manner within drug discovery pipelines. kMoL is available on GitHub at https://github.com/elix-tech/kmol . Scientific contribution The primary scientific contribution of this research project is the introduction and evaluation of kMoL, an open-source machine learning library with integrated federated learning capabilities. By demonstrating advanced customization and security capabilities without additional programming requirements, kMoL represents an accessible yet secure open-source platform for collaborative drug discovery projects. Additionally, the experiment results provide further insights into the performance trade-offs associated with federated learning strategies, presenting valuable guidance for deploying machine learning models in a privacy-preserving manner within drug discovery pipelines.

中文翻译:


kMoL:用于药物发现的开源机器和联合学习库



机器学习正迅速成为药物发现流程中不可或缺的一部分,尤其是定量构效关系 (QSAR) 和吸收、分布、代谢和排泄 (ADME) 任务。事实证明,图卷积网络 (GCN) 模型特别有前途,因为它们具有使用基于图的表示对分子结构进行建模的固有能力。然而,在实践中最大限度地发挥此类模型的潜力是具有挑战性的,因为公司优先考虑数据隐私和安全,而不是协作计划,以提高模型性能和稳健性。kMoL 是一个开源机器学习库,具有集成的联邦学习功能,旨在应对此类挑战。其主要功能包括最先进的模型架构、贝叶斯优化、可解释性和联合学习机制。它展示了广泛的定制可能性、高级安全功能、用户特定模型的直接实现以及对自定义数据集的高度适应性,而无需额外的编程要求。kMoL 通过本地训练的基准设置和分布式联邦学习实验使用各种数据集进行评估,以评估库的功能和灵活性,以及促进快速实用实验的能力。此外,这些实验的结果进一步了解了与联合学习策略相关的性能权衡,为在药物发现管道中以保护隐私的方式部署机器学习模型提供了有价值的指导。kMoL 可在 GitHub 上获取,网址为 https://github.com/elix-tech/kmol 。 科学贡献 该研究项目的主要科学贡献是引入和评估 kMoL,这是一个具有集成联邦学习功能的开源机器学习库。通过展示无需额外编程要求的高级定制和安全功能,kMoL 代表了一个用于协作药物发现项目的可访问且安全的开源平台。此外,实验结果还进一步洞察了与联合学习策略相关的性能权衡,为在药物发现管道中以保护隐私的方式部署机器学习模型提供了有价值的指导。
更新日期:2025-02-25
down
wechat
bug