当前位置:
X-MOL 学术
›
Chem. Eng. Sci.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
ZeoReader: Automated extraction of synthesis steps from zeolite synthesis literature for autonomous experiments
Chemical Engineering Science ( IF 4.1 ) Pub Date : 2024-11-07 , DOI: 10.1016/j.ces.2024.120916 Song He, Wenli Du, Xin Peng, Xin Li
Chemical Engineering Science ( IF 4.1 ) Pub Date : 2024-11-07 , DOI: 10.1016/j.ces.2024.120916 Song He, Wenli Du, Xin Peng, Xin Li
Material synthesis literature documents detailed synthesis procedures, which provide valuable insight and guidance for designing practical synthesis routes. Information extraction (IE) techniques have emerged as powerful tools to obtain structured synthesis-related data. However, current IE methods struggle to differentiate semantically similar experimental records and extract dense experimental properties with abstract expressions, limiting their effectiveness in the zeolite synthesis domain. To this end, we propose ZeoReader, an end-to-end IE framework designed to extract synthesis steps from zeolite synthesis literature. Specifically, to effectively distinguish between semantically similar descriptions of synthesis and characterization experiments, ZeoReader constructs a MatSciBERT-based paragraph classifier that offers rich prior synthesis knowledge. For improving the extraction of complete synthesis steps in complex sentences, ZeoReader develops a two-stage synthesis step extraction model, which introduces customized contrastive learning to model the distributions of dense properties and capture features of abstract expressions. Furthermore, domain-specific parsing strategies are proposed to enable ZeoReader to automatically parse PDF documents, identify synthesis experimental passages, and extract structured zeolite synthesis steps containing actions and corresponding experimental properties. Extensive experiments demonstrate that ZeoReader detects synthesis passages with an accuracy of 94.06% on out-of-sample documents and extracts experimental actions and properties with an F1 score of 93.05% and 74.99%, respectively. Our proposed IE framework can be embedded in autonomous unmanned zeolite synthesis experiments to rapidly understand, reproduce and validate existing experimental routes, thus facilitating new zeolite exploration.
中文翻译:
ZeoReader:从沸石合成文献中自动提取合成步骤,用于自主实验
材料合成文献记录了详细的合成程序,为设计实用的合成路线提供了有价值的见解和指导。信息提取 (IE) 技术已成为获取结构化合成相关数据的强大工具。然而,当前的 IE 方法难以区分语义相似的实验记录并通过抽象表达式提取密集的实验性质,从而限制了它们在沸石合成领域的有效性。为此,我们提出了 ZeoReader,这是一个端到端的 IE 框架,旨在从沸石合成文献中提取合成步骤。具体来说,为了有效地区分合成和表征实验的语义相似描述,ZeoReader 构建了一个基于 MatSciBERT 的段落分类器,该分类器提供了丰富的先验合成知识。为了改进复杂句子中完整合成步骤的提取,ZeoReader 开发了一个两阶段合成步骤提取模型,该模型引入了定制的对比学习来模拟密集属性的分布和抽象表达式的捕获特征。此外,提出了特定领域的解析策略,使 ZeoReader 能够自动解析 PDF 文档,识别合成实验段落,并提取包含动作和相应实验特性的结构化沸石合成步骤。大量实验表明,ZeoReader 在样本外文档中以 94.06% 的准确率检测合成段落,并以 F1 评分分别为 93.05% 和 74.99% 提取实验动作和性质。 我们提出的 IE 框架可以嵌入到自主无人沸石合成实验中,以快速理解、复制和验证现有的实验路线,从而促进新的沸石勘探。
更新日期:2024-11-07
中文翻译:
ZeoReader:从沸石合成文献中自动提取合成步骤,用于自主实验
材料合成文献记录了详细的合成程序,为设计实用的合成路线提供了有价值的见解和指导。信息提取 (IE) 技术已成为获取结构化合成相关数据的强大工具。然而,当前的 IE 方法难以区分语义相似的实验记录并通过抽象表达式提取密集的实验性质,从而限制了它们在沸石合成领域的有效性。为此,我们提出了 ZeoReader,这是一个端到端的 IE 框架,旨在从沸石合成文献中提取合成步骤。具体来说,为了有效地区分合成和表征实验的语义相似描述,ZeoReader 构建了一个基于 MatSciBERT 的段落分类器,该分类器提供了丰富的先验合成知识。为了改进复杂句子中完整合成步骤的提取,ZeoReader 开发了一个两阶段合成步骤提取模型,该模型引入了定制的对比学习来模拟密集属性的分布和抽象表达式的捕获特征。此外,提出了特定领域的解析策略,使 ZeoReader 能够自动解析 PDF 文档,识别合成实验段落,并提取包含动作和相应实验特性的结构化沸石合成步骤。大量实验表明,ZeoReader 在样本外文档中以 94.06% 的准确率检测合成段落,并以 F1 评分分别为 93.05% 和 74.99% 提取实验动作和性质。 我们提出的 IE 框架可以嵌入到自主无人沸石合成实验中,以快速理解、复制和验证现有的实验路线,从而促进新的沸石勘探。