当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2024-08-16 , DOI: 10.1186/s13321-024-00894-1
José T Moreira-Filho 1 , Dhruv Ranganath 2 , Mike Conway 3 , Charles Schmitt 4 , Nicole Kleinstreuer 1 , Kamel Mansouri 1
Affiliation  

With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or a combination. However, existing tools for chemical grouping often require specialized programming skills or the use of commercial software packages. To address these challenges, we developed a user-friendly chemical grouping workflow implemented in KNIME, a free, open-source, low/no-code, data analytics platform. The workflow serves as an all-encompassing tool, expertly incorporating a range of processes such as molecular descriptor calculation, feature selection, dimensionality reduction, hyperparameter search, and supervised and unsupervised machine learning methods, enabling effective chemical grouping and visualization of results. Furthermore, we implemented tools for interpretation, identifying key molecular descriptors for the chemical groups, and using natural language summaries to clarify the rationale behind these groupings. The workflow was designed to run seamlessly in both the KNIME local desktop version and KNIME Server WebPortal as a web application. It incorporates interactive interfaces and guides to assist users in a step-by-step manner. We demonstrate the utility of this workflow through a case study using an eye irritation and corrosion dataset. Scientific contributions This work presents a novel, comprehensive chemical grouping workflow in KNIME, enhancing accessibility by integrating a user-friendly graphical interface that eliminates the need for extensive programming skills. This workflow uniquely combines several features such as automated molecular descriptor calculation, feature selection, dimensionality reduction, and machine learning algorithms (both supervised and unsupervised), with hyperparameter optimization to refine chemical grouping accuracy. Moreover, we have introduced an innovative interpretative step and natural language summaries to elucidate the underlying reasons for chemical groupings, significantly advancing the usability of the tool and interpretability of the results.

中文翻译:


化学信息学民主化:使用自动化 KNIME 工作流程进行可解释的化学分组



随着公共数据库中化学数据的可用性不断增加,用于分析、探索、可视化和从这些数据中提取信息的创新技术和算法已经出现。其中一种技术是化学分组,其中根据物理化学性质、用途、生物活性或组合将具有共同特征的化学物质分为不同的组。然而,现有的化学分组工具通常需要专门的编程技能或使用商业软件包。为了应对这些挑战,我们开发了一个用户友好的化学分组工作流程,在 KNIME 中实施,KNIME 是一个免费、开源、低代码/无代码的数据分析平台。该工作流程是一个包罗万象的工具,熟练地结合了一系列过程,如分子描述符计算、特征选择、降维、超参数搜索以及监督和无监督机器学习方法,从而实现有效的化学分组和结果可视化。此外,我们还实现了解释工具,识别化学基团的关键分子描述符,并使用自然语言摘要来阐明这些分组背后的基本原理。该工作流程旨在作为 Web 应用程序在 KNIME 本地桌面版本和 KNIME Server WebPortal 中无缝运行。它包含交互式界面和指南,以逐步帮助用户。我们通过使用眼睛刺激和腐蚀数据集的案例研究展示了该工作流程的实用性。 科学贡献 这项工作在 KNIME 中提出了一种新颖、全面的化学分组工作流程,通过集成用户友好的图形界面来增强可访问性,从而消除了对广泛编程技能的需求。该工作流程独特地结合了多种功能,例如自动分子描述符计算、特征选择、降维和机器学习算法(有监督和无监督)以及超参数优化,以提高化学分组的准确性。此外,我们引入了创新的解释步骤和自然语言摘要来阐明化学分组的根本原因,显着提高了工具的可用性和结果的可解释性。
更新日期:2024-08-17
down
wechat
bug