当前位置: X-MOL 学术Transportation › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An approach to assess the role of features in detection of transportation modes
Transportation ( IF 3.5 ) Pub Date : 2024-05-18 , DOI: 10.1007/s11116-024-10492-7
Sajjad Sowlati , Rahim Ali Abbaspour , Alireza Chehreghan

One of the fundamental prerequisites for interpreting collected passive travel data to develop intelligent transportation systems is the detection of transportation modes. The literature has divided transportation mode detection into two parts: feature extraction and implementation of classification models. Selecting and employing influential features will help maximize the power of the classification model. Meanwhile, the interpretation and identification of influential features, which will be the focus of this study, have received less attention. Importantly, the influence of features varies depending on the nature of the input data and the choice of classification models. In many cases, the extracted features show interdependence, where their combined correlation significantly impacts specific outcomes. Consequently, evaluating the effectiveness of individual features in isolation may not produce accurate results, requiring the exploration of alternative methodologies. This study seeks to bridge these gaps through a comprehensive investigation. Three open-source datasets, Geolife, MTL Trajet 2017, and MTL Trajet 2016, were utilized to enhance reliability, validate the approach, and investigate the variability of influential features under various data collection conditions. Originally, various features were extracted and grouped for this purpose based on their kinematic, spatial, and contextual features. Then, three powerful classification models (Random Forest, LightGBM, and XGBoost) were utilized. A hybrid feature selection algorithm was employed to select a subset of features to analyze the variability of influential features across different classification models. The algorithm removed over half of the features with minimal or negative impact, thereby simplifying the process of classification identification. Since the features when combined in the form of a subset, would result in powerful identification, the influence of the features was analyzed within a set of features instead of analyzing each feature individually. Two approaches, “number of feature repetitions” and “Shapley Additive Explanations (SHAP) value,” were adopted to interpret the computation. After implementation, the “average velocity” with repetition in all datasets and classification models (nine repetitions) had the highest SHAP value, making it the most influential feature across all datasets and classification models. The “public stations indicator” was the most influential spatial feature with the highest SHAP value, appearing nine times, while “holiday” had the most repetitions among the contextual features.



中文翻译:

一种评估特征在交通模式检测中的作用的方法

解释收集的被动出行数据以开发智能交通系统的基本先决条件之一是交通模式的检测。文献将交通方式检测分为两个部分:特征提取和分类模型的实现。选择和使用有影响力的特征将有助于最大限度地发挥分类模型的力量。与此同时,影响特征的解释和识别,这将是本研究的重点,却受到较少的关注。重要的是,特征的影响根据输入数据的性质和分类模型的选择而变化。在许多情况下,提取的特征显示出相互依赖性,它们的组合相关性会显着影响特定的结果。因此,单独评估单个特征的有效性可能不会产生准确的结果,需要探索替代方法。本研究旨在通过全面调查来弥补这些差距。利用三个开源数据集 Geolife、MTL Trajet 2017 和 MTL Trajet 2016 来增强可靠性、验证方法并研究各种数据收集条件下影响特征的变异性。最初,为此目的,根据运动学、空间和上下文特征提取并分组了各种特征。然后,利用了三种强大的分类模型(随机森林、LightGBM 和 XGBoost)。采用混合特征选择算法来选择特征子集,以分析不同分类模型中影响特征的变异性。该算法以最小或负面影响删除了一半以上的特征,从而简化了分类识别的过程。由于特征以子集的形式组合时会产生强大的识别能力,因此在一组特征内分析特征的影响,而不是单独分析每个特征。采用“特征重复次数”和“Shapley 加性解释(SHAP)值”两种方法来解释计算。实施后,所有数据集和分类模型中重复的“平均速度”(九次重复)具有最高的 SHAP 值,使其成为所有数据集和分类模型中最具影响力的特征。 “公共站点指标”是最具影响力、SHAP值最高的空间特征,出现了9次,而“假日”是上下文特征中重复次数最多的。

更新日期:2024-05-18
down
wechat
bug