当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hilbert-curve assisted structure embedding method
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2024-07-29 , DOI: 10.1186/s13321-024-00850-z
Gergely Zahoránszky-Kőhalmi 1 , Kanny K Wan 1 , Alexander G Godfrey 1
Affiliation  

Chemical space embedding methods are widely utilized in various research settings for dimensional reduction, clustering and effective visualization. The maps generated by the embedding process can provide valuable insight to medicinal chemists in terms of the relationships between structural, physicochemical and biological properties of compounds. However, these maps are known to be difficult to interpret, and the ‘‘landscape’’ on the map is prone to ‘‘rearrangement’’ when embedding different sets of compounds. In this study we present the Hilbert-Curve Assisted Space Embedding (HCASE) method which was designed to create maps by organizing structures according to a logic familiar to medicinal chemists. First, a chemical space is created with the help of a set of ‘‘reference scaffolds’’. These scaffolds are sorted according to the medicinal chemistry inspired Scaffold-Key algorithm found in prior art. Next, the ordered scaffolds are mapped to a line which is folded into a higher dimensional (here: 2D) space. The intricately folded line is referred to as a pseudo-Hilbert-Curve. The embedding of a compound happens by locating its most similar reference scaffold in the pseudo-Hilbert-Curve and assuming the respective position. Through a series of experiments, we demonstrate the properties of the maps generated by the HCASE method. Subjects of embeddings were compounds of the DrugBank and CANVASS libraries, and the chemical spaces were defined by scaffolds extracted from the ChEMBL database. The novelty of HCASE method lies in generating robust and intuitive chemical space embeddings that are reflective of a medicinal chemist’s reasoning, and the precedential use of space filling (Hilbert) curve in the process. https://github.com/ncats/hcase

中文翻译:


希尔伯特曲线辅助结构嵌入方法



化学空间嵌入方法广泛应用于各种研究环境中,用于降维、聚类和有效可视化。嵌入过程生成的图谱可以为药物化学家提供有关化合物结构、物理化学和生物特性之间关系的宝贵见解。然而,众所周知,这些图谱很难解释,并且当嵌入不同组的化合物时,图谱上的“景观”很容易发生“重新排列”。在这项研究中,我们提出了希尔伯特曲线辅助空间嵌入(HCASE)方法,该方法旨在根据药物化学家熟悉的逻辑组织结构来创建地图。首先,在一组“参考支架”的帮助下创建化学空间。根据现有技术中发现的受药物化学启发的Scaffold-Key算法对这些支架进行分类。接下来,有序的支架被映射到一条线,该线被折叠到更高维度(此处:2D)空间中。复杂的折叠线被称为伪希尔伯特曲线。化合物的嵌入是通过在伪希尔伯特曲线中定位其最相似的参考支架并假设相应的位置来实现的。通过一系列的实验,我们展示了 HCASE 方法生成的图的特性。嵌入的对象是 DrugBank 和 CANVASS 库的化合物,化学空间由从 ChEMBL 数据库中提取的支架定义。 HCASE 方法的新颖性在于生成稳健且直观的化学空间嵌入,这些嵌入反映了药物化学家的推理,并且在此过程中首先使用了空间填充(希尔伯特)曲线。 https://github.com/ncats/hcase
更新日期:2024-07-29
down
wechat
bug