Computational Materials Science ( IF 3.1 ) Pub Date : 2022-05-02 , DOI: 10.1016/j.commatsci.2022.111476 Smarak Rath 1 , G. Sudha Priyanga 2 , N. Nagappan 1 , Tiju Thomas 1
An approach that would allow quick determination of compositions that are most likely to be direct band gap materials would significantly accelerate research on light-harvesting materials. Inorganic perovskites are attractive for this purpose since they afford compositional flexibility, while also offering stability. Here, ABX3 inorganic perovskites (A and B are cations and X is an anion) are classified into direct band gap and indirect band gap materials by using the XGBOOST (eXtreme Gradient BOOST) classifier. We use a dataset containing 1528 ABX3 compounds (X = O, F, Cl, Br, I, S, Se, Te, N, or P) along with information on the nature of their band gap (direct or indirect). All the data is taken from the Materials Project database. Descriptors for these materials are generated using the Matminer python package. Ten-fold cross-validation with the XGBOOST classifier is used on the dataset and the average accuracy is found to be 72.8%. To generate a confusion matrix, the dataset is once again split into a training set and a testing set after cross-validation. Subsequently, the confusion matrix is generated for that particular test set. It is found that the precision for the prediction of direct band gap materials is 81% i.e., 81% of the materials predicted to be direct band gap materials are actually direct band gap materials. Thus, machine learning can be an effective tool for discovering novel direct band gap perovskites. Finally, SHAP (SHapley Additive exPlanations) analysis is performed to determine the most important descriptors. One key insight gained from the SHAP analysis is that the absence of transition metals and elements belonging to groups IIIA to VIIIA with atomic number greater than 20 increases the probability of the perovskite having a direct band gap.
中文翻译:
使用机器学习发现用于光收集的直接带隙钙钛矿
一种可以快速确定最有可能是直接带隙材料的成分的方法将显着加速对光捕获材料的研究。为此目的,无机钙钛矿很有吸引力,因为它们提供了成分的灵活性,同时也提供了稳定性。在这里,ABX 3无机钙钛矿(A 和 B 是阳离子,X 是阴离子)通过使用 XGBOOST(eXtreme Gradient BOOST)分类器分为直接带隙和间接带隙材料。我们使用包含 1528 ABX 3的数据集化合物(X = O、F、Cl、Br、I、S、Se、Te、N 或 P)以及有关其带隙(直接或间接)性质的信息。所有数据均取自材料项目数据库。这些材料的描述符是使用 Matminer python 包生成的。在数据集上使用 XGBOOST 分类器进行 10 倍交叉验证,发现平均准确率为 72.8%。为了生成混淆矩阵,数据集在交叉验证后再次分为训练集和测试集。随后,为该特定测试集生成混淆矩阵。发现直接带隙材料的预测精度为81%,即预测为直接带隙材料的材料中有81%实际上是直接带隙材料。因此,机器学习可以成为发现新型直接带隙钙钛矿的有效工具。最后,执行 SHAP(SHapley Additive exPlanations)分析以确定最重要的描述符。从 SHAP 分析中获得的一个关键见解是,原子序数大于 20 的属于 IIIA 至 VIIIA 族的过渡金属和元素的缺失增加了钙钛矿具有直接带隙的概率。