当前位置: X-MOL 学术Transp. Res. Part C Emerg. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Estimation of discrete choice models considering simultaneously multiple objectives and complex data characteristics
Transportation Research Part C: Emerging Technologies ( IF 7.6 ) Pub Date : 2024-02-14 , DOI: 10.1016/j.trc.2024.104517
Prithvi Bhat Beeramoole , Ryan Kelly , Md Mazharul Haque , Alban Pinz , Alexander Paz

This paper focuses on the discrete choice estimation problem, which involves multiple objectives and testing a broad range of hypotheses that can affect both interpretability and prediction accuracy. Previous studies have proposed mathematical programming formulations to assist with hypothesis testing and estimation. However, there is limited knowledge regarding the effect of in- and out-of-sample model performance criteria during the search for parsimonious specifications. To address this knowledge gap, a multi-objective optimization framework is proposed, including both in-sample goodness-of-fit and out-of-sample predictive accuracy, to generate multiple unique specifications and perform extensive hypothesis testing considering simultaneously potential explanatory variables, their functional forms, nonlinearities, heterogeneous effects, and correlations. A metaheuristic was designed and implemented to solve the proposed multi-objective nonlinear mixed-integer mathematical programming problem. Experiments, including various datasets and discrete choices, were used to illustrate the efficacy of the proposed framework. The goal was to find specifications that are either similar or dominate those reported in literature, considering both interpretability and prediction accuracy. Important insights regarding potential explanatory factors and heterogeneous preferences, which were not reported in literature, were captured using the proposed framework. In addition, for one of the datasets used in this study, the proposed framework enabled the discovery of three distinct clusters considering specification type and model performance in terms of interpretability and prediction accuracy. For the given dataset, these clusters suggest that the proposed approach allowed extensive exploration of the data across different specification types. In addition, the Mixed-Logit models with correlated parameters were found to perform significantly better in terms of in-sample fit than those without correlation. Similarly, multinomial-Logit models showed the worst performance for the given dataset. In contrast, multinomial-Logit models provided superior out-of-sample fit relative to advanced specifications, which illustrates trade-offs between model in- and out-of-sample fitness. A comparative analysis, including multiple performance measures, was also conducted. The results suggest that model evaluation using in-sample Bayesian Information Criterion (BIC) and out-of-sample Mean Absolute Error (MAE), and in-sample BIC and out-of-sample Mean Squared Error (MSE) enables estimation of specifications with better in- and out-of-sample performance compared to those estimated using maximum log-likelihood and minimum number of model parameters. In addition, a mostly linear relationship was observed between in-sample and out-of-sample log-likelihood, indicating that the latter does not provide much additional information regarding prediction compared to the in-sample estimates. These results showed the value of using an optimization framework to support modelling decisions by enabling extensive hypothesis testing and including multiple performance criteria as well as complex data characteristics to discover important and reliable insights.

中文翻译:

同时考虑多目标和复杂数据特征的离散选择模型估计

本文重点关注离散选择估计问题,该问题涉及多个目标并测试可能影响可解释性和预测准确性的广泛假设。先前的研究提出了数学规划公式来协助假设检验和估计。然而,在寻找简约规范过程中,关于样本内和样本外模型性能标准的影响的知识有限。为了解决这一知识差距,提出了一个多目标优化框架,包括样本内拟合优度和样本外预测准确性,以生成多个独特的规范并同时考虑潜在的解释变量来执行广泛的假设检验,它们的函数形式、非线性、异质效应和相关性。设计并实现了元启发式算法来解决所提出的多目标非线性混合整数数学规划问题。实验(包括各种数据集和离散选择)用于说明所提出框架的有效性。目标是在考虑可解释性和预测准确性的情况下,找到与文献中报告的规范相似或占主导地位的规范。使用所提出的框架捕获了文献中未报道的有关潜在解释因素和异质偏好的重要见解。此外,对于本研究中使用的数据集之一,考虑到规范类型和模型在可解释性和预测准确性方面的性能,所提出的框架能够发现三个不同的集群。对于给定的数据集,这些集群表明所提出的方法允许跨不同规范类型对数据进行广泛的探索。此外,我们发现具有相关参数的 Mixed-Logit 模型在样本内拟合方面比不具有相关性的模型表现得更好。同样,多项 Logit 模型对于给定数据集显示出最差的性能。相比之下,多项 Logit 模型相对于高级规范提供了卓越的样本外拟合,这说明了模型样本内拟合度和样本外拟合度之间的权衡。还进行了比较分析,包括多种绩效指标。结果表明,使用样本内贝叶斯信息准则 (BIC) 和样本外平均绝对误差 (MAE) 以及样本内 BIC 和样本外均方误差 (MSE) 进行模型评估可以实现规格估计与使用最大对数似然和最小模型参数数量估计的结果相比,具有更好的样本内和样本外性能。此外,在样本内和样本外对数似然之间观察到大部分为线性关系,这表明与样本内估计相比,后者没有提供太多有关预测的附加信息。
更新日期:2024-02-14
down
wechat
bug