当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ReLMM: Reinforcement Learning Optimizes Feature Selection in Modeling Materials.
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2024-12-17 , DOI: 10.1021/acs.jcim.4c01934
Maitreyee Sharma Priyadarshini,Nikhil Kumar Thota,Rigoberto Hernandez

A challenge to materials discovery is the identification of the physical features that are most correlated to a given target material property without redundancy. Such variables necessarily comprise the optimal search domain in subsequent material design. Here, we introduce a reinforcement learning-based material model (ReLMM) as a tool for analyzing a given database in identifying a minimal or near minimal subset of physical features for the design of a material with a given target property. We aim for minimality in the selected subset with respect to its size─smaller being better─ while maintaining the desired accuracy of the prediction. We have shown, using synthetic multiscale data sets, that ReLMM can identify the relative importance of features, and thus help identify which should be selected across scales. In the context of semiconducting materials, ReLMM can be used to improve the prediction of the band gap by identifying which features should be selected in model building. For metal halide perovskites, ReLMM was seen to find a near minimal data set at least as well as, if not better than, state-of-the-art feature selection tools such as LASSO and XGBoost. We also found that our domain-science oriented approach can be used to uncover the hierarchical structure of a material from a database consisting of molecular-scale, mesoscale and device-scale features and labels in complementarity with an earlier hierarchical model called NestedAE.

中文翻译:


ReLMM:强化学习优化了建模材料中的特征选择。



材料发现的一个挑战是识别与给定目标材料特性最相关的物理特征,而没有冗余。这些变量必然包含在后续材料设计中的最佳搜索域。在这里,我们介绍了一个基于强化学习的材料模型 (ReLMM) 作为分析给定数据库的工具,以确定具有给定目标属性的材料设计的最小或接近最小的物理特征子集。我们的目标是在所选子集的大小方面保持最小(越小越好),同时保持所需的预测准确性。我们已经证明,使用合成多尺度数据集,ReLMM 可以识别特征的相对重要性,从而帮助确定应该跨尺度选择哪些特征。在半导体材料方面,ReLMM 可用于通过确定在模型构建中应选择哪些特征来改进带隙的预测。对于金属卤化物钙钛矿,ReLMM 发现了一个近乎最小的数据集,至少与 LASSO 和 XGBoost 等最先进的特征选择工具一样好,如果不是更好的话。我们还发现,我们面向领域科学的方法可用于从由分子尺度、介尺度和器件尺度特征和标签组成的数据库中揭示材料的分层结构,与早期称为 NestedAE 的分层模型互补。
更新日期:2024-12-17
down
wechat
bug