当前位置:
X-MOL 学术
›
Eur. J. Oper. Res.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Inherently interpretable machine learning for credit scoring: Optimal classification tree with hyperplane splits
European Journal of Operational Research ( IF 6.0 ) Pub Date : 2024-11-09 , DOI: 10.1016/j.ejor.2024.10.046 Jiancheng Tu, Zhibin Wu
European Journal of Operational Research ( IF 6.0 ) Pub Date : 2024-11-09 , DOI: 10.1016/j.ejor.2024.10.046 Jiancheng Tu, Zhibin Wu
An accurate and interpretable credit scoring model plays a crucial role in helping financial institutions reduce losses by promptly detecting, containing, and preventing defaulters. However, existing models often face a trade-off between interpretability and predictive accuracy. Traditional models like Logistic Regression (LR) offer high interpretability but may have limited predictive performance, while more complex models may improve accuracy at the expense of interpretability. In this paper, we tackle the credit scoring problem with imbalanced data by proposing two new classification models based on the optimal classification tree with hyperplane splits (OCT-H). OCT-H provides transparency and easy interpretation with ‘if-then’ decision tree rules. The first model, the cost-sensitive optimal classification tree with hyperplane splits (CSOCT-H). The second model, the optimal classification tree with hyperplane splits based on maximizing F1-Score (OCT-H-F1), aims to directly maximize the F1-score. To enhance model scalability, we introduce a data sample reduction method using data binning and feature selection. We then propose two solution methods: a heuristic approach and a method utilizing warm-start techniques to accelerate the solving process. We evaluated the proposed models on four public datasets. The results show that OCT-H significantly outperforms traditional interpretable models, such as Decision Trees (DT) and Logistic Regression (LR), in both predictive performance and interpretability. On certain datasets, OCT-H performs as well as or better than advanced ensemble tree models, effectively narrowing the gap between interpretable models and black-box models.
中文翻译:
用于信用评分的固有可解释机器学习:具有超平面分裂的最优分类树
准确且可解释的信用评分模型在帮助金融机构通过及时检测、遏制和预防违约者来减少损失方面发挥着至关重要的作用。然而,现有模型经常面临可解释性和预测准确性之间的权衡。Logistic Regression (LR) 等传统模型提供高可解释性,但预测性能可能有限,而更复杂的模型可能会以牺牲可解释性为代价来提高准确性。在本文中,我们通过提出两种基于超平面分裂最优分类树 (OCT-H) 的新分类模型来解决不平衡数据的信用评分问题。OCT-H 通过“if-then”决策树规则提供透明度和易于解释。第一个模型是具有超平面分裂的成本敏感型最优分类树 (CSOCT-H)。第二个模型是基于最大化 F1 分数 (OCT-H-F1) 的超平面分裂的最优分类树,旨在直接最大化 F1 分数。为了提高模型的可扩展性,我们引入了一种使用数据分箱和特征选择的数据样本减少方法。然后,我们提出了两种求解方法:一种启发式方法和一种利用热启动技术来加速求解过程的方法。我们在四个公共数据集上评估了拟议的模型。结果表明,OCT-H 在预测性能和可解释性方面都明显优于传统的可解释模型,例如决策树 (DT) 和逻辑回归 (LR)。在某些数据集上,OCT-H 的性能与高级集成树模型一样好或更好,有效地缩小了可解释模型和黑盒模型之间的差距。
更新日期:2024-11-09
中文翻译:
用于信用评分的固有可解释机器学习:具有超平面分裂的最优分类树
准确且可解释的信用评分模型在帮助金融机构通过及时检测、遏制和预防违约者来减少损失方面发挥着至关重要的作用。然而,现有模型经常面临可解释性和预测准确性之间的权衡。Logistic Regression (LR) 等传统模型提供高可解释性,但预测性能可能有限,而更复杂的模型可能会以牺牲可解释性为代价来提高准确性。在本文中,我们通过提出两种基于超平面分裂最优分类树 (OCT-H) 的新分类模型来解决不平衡数据的信用评分问题。OCT-H 通过“if-then”决策树规则提供透明度和易于解释。第一个模型是具有超平面分裂的成本敏感型最优分类树 (CSOCT-H)。第二个模型是基于最大化 F1 分数 (OCT-H-F1) 的超平面分裂的最优分类树,旨在直接最大化 F1 分数。为了提高模型的可扩展性,我们引入了一种使用数据分箱和特征选择的数据样本减少方法。然后,我们提出了两种求解方法:一种启发式方法和一种利用热启动技术来加速求解过程的方法。我们在四个公共数据集上评估了拟议的模型。结果表明,OCT-H 在预测性能和可解释性方面都明显优于传统的可解释模型,例如决策树 (DT) 和逻辑回归 (LR)。在某些数据集上,OCT-H 的性能与高级集成树模型一样好或更好,有效地缩小了可解释模型和黑盒模型之间的差距。