Estimating non-overfitted convex production technologies: A stochastic machine learning approach,European Journal of Operational Research

当前位置： X-MOL 学术 › Eur. J. Oper. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Estimating non-overfitted convex production technologies: A stochastic machine learning approach
European Journal of Operational Research ( IF 6.0 ) Pub Date : 2024-11-28 , DOI: 10.1016/j.ejor.2024.11.030
Maria D. Guillen , Vincent Charles , Juan Aparicio

Overfitting is a classical statistical issue that occurs when a model fits a particular observed data sample too closely, potentially limiting its generalizability. While Data Envelopment Analysis (DEA) is a powerful non-parametric method for assessing the relative efficiency of decision-making units (DMUs), its reliance on the minimal extrapolation principle can lead to concerns about overfitting, particularly when the goal extends beyond evaluating the specific DMUs in the sample to making broader inferences. In this paper, we propose an adaptation of Stochastic Gradient Boosting to estimate production possibility sets that mitigate overfitting while satisfying shape constraints such as convexity and free disposability. Our approach is not intended to replace DEA but to complement it, offering an additional tool for scenarios where generalization is important. Through simulation experiments, we demonstrate that the proposed method performs well compared to DEA, especially in high-dimensional settings. Furthermore, the new machine learning-based technique is compared to the Corrected Concave Non-parametric Least Squares (C2NLS), showing competitive performance. We also illustrate how the usual efficiency measures in DEA can be implemented under our approach. Finally, we provide an empirical example based on data from the Program for International Student Assessment (PISA) to demonstrate the applicability of the new method.

中文翻译：

估计非过拟合凸生产技术：一种随机机器学习方法

过拟合是一个典型的统计问题，当模型与特定观测数据样本的拟合过于紧密时，就会出现过拟合，这可能会限制其泛化性。虽然数据包络分析（DEA）是一种用于评估决策单元（DMU）相对效率的强大非参数方法，但它对最小外推原则的依赖可能会导致对过度拟合的担忧，特别是当目标超越评估样本中的特定 DMU 到进行更广泛的推断时。在本文中，我们提出了随机梯度提升的改编，以估计在满足凸性和自由可抛弃性等形状约束的同时减轻过拟合的生产可能性集。我们的方法不是要取代 DEA，而是要补充它，为泛化很重要的场景提供额外的工具。通过模拟实验，我们证明与 DEA 相比，所提出的方法表现良好，尤其是在高维环境中。此外，将基于机器学习的新技术与校正凹面非参数最小二乘法（C2NLS）进行了比较，显示出具有竞争力的性能。我们还说明了如何在我们的方法下实施 DEA 中通常的效率措施。最后，我们提供了一个基于国际学生评估计划（PISA）数据的经验示例，以证明新方法的适用性。

更新日期：2024-11-28

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文