当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Molecular identification via molecular fingerprint extraction from atomic force microscopy images
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2024-11-25 , DOI: 10.1186/s13321-024-00921-1
Manuel González Lastre, Pablo Pou, Miguel Wiche, Daniel Ebeling, Andre Schirmeisen, Rubén Pérez

Non–Contact Atomic Force Microscopy with CO–functionalized metal tips (referred to as HR-AFM) provides access to the internal structure of individual molecules adsorbed on a surface with totally unprecedented resolution. Previous works have shown that deep learning (DL) models can retrieve the chemical and structural information encoded in a 3D stack of constant-height HR–AFM images, leading to molecular identification. In this work, we overcome their limitations by using a well-established description of the molecular structure in terms of topological fingerprints, the 1024–bit Extended Connectivity Chemical Fingerprints of radius 2 (ECFP4), that were developed for substructure and similarity searching. ECFPs provide local structural information of the molecule, each bit correlating with a particular substructure within the molecule. Our DL model is able to extract this optimized structural descriptor from the 3D HR–AFM stacks and use it, through virtual screening, to identify molecules from their predicted ECFP4 with a retrieval accuracy on theoretical images of 95.4%. Furthermore, this approach, unlike previous DL models, assigns a confidence score, the Tanimoto similarity, to each of the candidate molecules, thus providing information on the reliability of the identification. By construction, the number of times a certain substructure is present in the molecule is lost during the hashing process, necessary to make them useful for machine learning applications. We show that it is possible to complement the fingerprint-based virtual screening with global information provided by another DL model that predicts from the same HR–AFM stacks the chemical formula, boosting the identification accuracy up to a 97.6%. Finally, we perform a limited test with experimental images, obtaining promising results towards the application of this pipeline under real conditions. Scientific contribution Previous works on molecular identification from AFM images used chemical descriptors that were intuitive for humans but sub–optimal for neural networks. We propose a novel method to extract the ECFP4 from AFM images and identify the molecule via a virtual screening, beating previous state-of-the-art models.

中文翻译:


通过从原子力显微镜图像中提取分子指纹进行分子鉴定



带有 CO 官能化金属针尖的非接触式原子力显微镜(简称 HR-AFM)以完全前所未有的分辨率访问吸附在表面上的单个分子的内部结构。以前的工作表明,深度学习 (DL) 模型可以检索编码在恒定高度 HR-AFM 图像的 3D 堆栈中的化学和结构信息,从而进行分子鉴定。在这项工作中,我们通过使用拓扑指纹图谱对分子结构的成熟描述,即半径 2 的 1024 位扩展连接化学指纹 (ECFP4) 来克服它们的局限性,该指纹图谱是为子结构和相似性搜索而开发的。ECFP 提供分子的局部结构信息,每个位都与分子内的特定子结构相关。我们的 DL 模型能够从 3D HR-AFM 堆栈中提取这种优化的结构描述符,并通过虚拟筛选使用它从其预测的 ECFP4 中识别分子,理论图像上的检索准确率为 95.4%。此外,与以前的 DL 模型不同,这种方法为每个候选分子分配一个置信度分数,即 Tanimoto 相似性,从而提供有关鉴定可靠性的信息。通过构造,分子中存在的某个子结构的次数在哈希过程中丢失,这对于使它们对机器学习应用有用是必要的。我们表明,可以用另一个 DL 模型提供的全局信息来补充基于指纹的虚拟筛选,该模型从相同的 HR-AFM 堆栈化学式中预测,将识别准确率提高到 97.6%。 最后,我们用实验图像进行了有限的测试,为该管道在真实条件下的应用获得了有希望的结果。科学贡献 以前从 AFM 图像中鉴定分子的工作使用了化学描述符,这些描述符对人类来说很直观,但对神经网络来说不是最佳选择。我们提出了一种从 AFM 图像中提取 ECFP4 并通过虚拟筛选识别分子的新方法,击败了以前最先进的模型。
更新日期:2024-11-25
down
wechat
bug