当前位置: X-MOL 学术J. Chem. Theory Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
k-Means Clustering in Fingerprint-Based Configuration Selection for Fitting Interatomic Potentials.
Journal of Chemical Theory and Computation ( IF 5.7 ) Pub Date : 2024-11-19 , DOI: 10.1021/acs.jctc.4c01225
Miroslav Lebeda,Jan Drahokoupil,Ludvík Löbel,Petr Vlčák

In this study, we present a method for selecting an arbitrary number of distinct configurations from a larger data set by applying k-means clustering to atomistic configuration fingerprints based on the CrystalNN model and radial distribution function (RDF). This approach improves the accuracy of fitting classical molecular dynamics interatomic potentials to density functional theory (DFT) data for both energies and forces while requiring fewer configurations than random selection. We demonstrate this improvement by fitting an embedded-atom method (EAM) potential for titanium, using various configurational sizes from an initial set of 1800 configurations. The k-means clustering consistently achieves better precision and lower standard deviations for a smaller number of configurations than random selection. The results also suggest that only about 30 configurations are sufficient to obtain an EAM model that describes well the full set of 1800 configurations in terms of energies and forces. Additionally, t-distributed stochastic neighbor embedding (t-SNE) method was used to reduce the configuration fingerprints into 2D space, and it revealed an overlap between two configuration subsets with and without Ti vacancy, indicating similar atomic environments. This similarity is captured by k-means clustering but not by random selection. Furthermore, when the overlapping configurations with vacancies were excluded from the k-means algorithm and used only as a test set, their energy and force predictions showed similar precision to those when they were included. This indicates that the overlapping configurations in the 2D t-SNE space indeed imply potential information redundancy among the atomistic configurations.

中文翻译:


基于指纹的配置选择中的 k-means 聚类,用于拟合原子间电位。



在这项研究中,我们提出了一种通过基于 CrystalNN 模型和径向分布函数 (RDF) 将 k-means 聚类应用于原子构型指纹,从更大的数据集中选择任意数量的不同构型的方法。这种方法提高了将经典分子动力学原子间势与能量和力的密度泛函理论 (DFT) 数据拟合的准确性,同时需要的配置比随机选择少。我们通过拟合钛的嵌入原子法 (EAM) 电位来证明这一改进,使用初始 1800 种配置中的各种配置尺寸。与随机选择相比,k-means 聚类分析始终可以针对较少数量的配置实现更高的精度和更低的标准差。结果还表明,只需大约 30 个配置就足以获得一个 EAM 模型,该模型在能量和力方面很好地描述了一整套 1800 种配置。此外,t 分布随机邻域嵌入 (t-SNE) 方法用于将构型指纹减少到 2D 空间中,它揭示了有和没有 Ti 空位的两个构型子集之间的重叠,表明类似的原子环境。这种相似性是通过 k-means 聚类来捕获的,而不是通过随机选择来捕获的。此外,当从 k-means 算法中排除带有空位的重叠配置并仅用作测试集时,它们的能量和力预测显示出与包含它们时相似的精度。这表明 2D t-SNE 空间中的重叠构型确实意味着原子构型之间存在潜在的信息冗余。
更新日期:2024-11-19
down
wechat
bug