Nature Communications ( IF 14.7 ) Pub Date : 2024-09-09 , DOI: 10.1038/s41467-024-51970-x Shams Mehdi 1 , Pratyush Tiwary 2, 3
In recent years, predictive machine learning models have gained prominence across various scientific domains. However, their black-box nature necessitates establishing trust in them before accepting their predictions as accurate. One promising strategy involves employing explanation techniques that elucidate the rationale behind a model’s predictions in a way that humans can understand. However, assessing the degree of human interpretability of these explanations is a nontrivial challenge. In this work, we introduce interpretation entropy as a universal solution for evaluating the human interpretability of any linear model. Using this concept and drawing inspiration from classical thermodynamics, we present Thermodynamics-inspired Explainable Representations of AI and other black-box Paradigms, a method for generating optimally human-interpretable explanations in a model-agnostic manner. We demonstrate the wide-ranging applicability of this method by explaining predictions from various black-box model architectures across diverse domains, including molecular simulations, text, and image classification.
中文翻译:
受热力学启发的人工智能解释
近年来,预测机器学习模型在各个科学领域都得到了重视。然而,它们的黑匣子性质使得在接受它们的预测准确之前必须先建立对它们的信任。一种有前途的策略涉及采用解释技术,以人类可以理解的方式阐明模型预测背后的基本原理。然而,评估这些解释的人类可解释程度是一个不小的挑战。在这项工作中,我们引入解释熵作为评估任何线性模型的人类可解释性的通用解决方案。利用这一概念并从经典热力学中汲取灵感,我们提出了受热力学启发的人工智能和其他黑盒范式的可解释表示,这是一种以与模型无关的方式生成最佳的人类可解释解释的方法。我们通过解释跨不同领域(包括分子模拟、文本和图像分类)的各种黑盒模型架构的预测来证明该方法的广泛适用性。