MPOCSR: optical chemical structure recognition based on multi-path Vision Transformer,Complex & Intelligent Systems

当前位置： X-MOL 学术 › Complex Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MPOCSR: optical chemical structure recognition based on multi-path Vision Transformer
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-07-22 , DOI: 10.1007/s40747-024-01561-6
Fan Lin , Jianhua Li

Optical chemical structure recognition (OCSR) is a fundamental and crucial task in the field of chemistry, which aims at transforming intricate chemical structure images into machine-readable formats. Current deep learning-based OCSR methods typically use image feature extractors to extract visual features and employ encoder-decoder architectures for chemical structure recognition. However, the performance of these methods is limited by their image feature extractors and the class imbalance of elements in chemical structure representation. This paper proposes MPOCSR (multi-path optical chemical structure recognition), which introduces the multi-path Vision Transformer (MPViT) and the class-balanced (CB) loss function to address these two challenges. MPOCSR uses MPViT as an image feature extractor, combining the advantages of convolutional neural networks and Vision Transformers. This strategy enables the provision of richer visual information for subsequent decoding processes. Furthermore, MPOCSR incorporates CB loss function to rebalance the loss weights among different categories. For training and validation of our method, we constructed a dataset that includes both Markush and non-Markush structures. Experimental results show that MPOCSR achieves an accuracy of 90.95% on the test set, surpassing other existing methods.

中文翻译：

MPOCSR：基于多路径Vision Transformer的光学化学结构识别

光学化学结构识别（OCSR）是化学领域的一项基本且关键的任务，旨在将复杂的化学结构图像转换为机器可读的格式。当前基于深度学习的 OCSR 方法通常使用图像特征提取器来提取视觉特征，并采用编码器-解码器架构进行化学结构识别。然而，这些方法的性能受到图像特征提取器和化学结构表示中元素类别不平衡的限制。本文提出MPOCSR（多路径光学化学结构识别），引入多路径视觉变换器（MPViT）和类平衡（CB）损失函数来解决这两个挑战。 MPOCSR 使用 MPViT 作为图像特征提取器，结合了卷积神经网络和 Vision Transformers 的优点。该策略能够为后续解码过程提供更丰富的视觉信息。此外，MPOCSR结合了CB损失函数来重新平衡不同类别之间的损失权重。为了训练和验证我们的方法，我们构建了一个包含马库什和非马库什结构的数据集。实验结果表明，MPOCSR在测试集上达到了90.95%的准确率，超越了其他现有方法。

更新日期：2024-07-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南