当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scalable concept drift adaptation for stream data mining
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-06-20 , DOI: 10.1007/s40747-024-01524-x
Lisha Hu , Wenxiu Li , Yaru Lu , Chunyu Hu

Stream data mining aims to handle the continuous and ongoing generation of data flows (e.g. weather, stock and traffic data), which often encounters concept drift as time progresses. Traditional offline algorithms struggle with learning from real-time data, making online algorithms more fitting for mining the stream data with dynamic concepts. Among families of the online learning algorithms, single pass stands out for its efficiency in processing one sample point at a time, and inspecting it only once at most. Currently, there exist online algorithms tailored for single pass over the stream data by converting the problems of classification into minimum enclosing ball. However, these methods mainly focus on expanding the ball to enclose the new data. An excessively large ball might overwrite data of the new concept, creating difficulty in triggering the model updating process. This paper proposes a new online single pass framework for stream data mining, namely Scalable Concept Drift Adaptation (SCDA), and presents three distinct online methods (SCDA-I, SCDA-II and SCDA-III) based on that framework. These methods dynamically adjust the ball by expanding or contracting when new sample points arrive, thereby effectively avoiding the issue of excessively large balls. To evaluate their performance, we conduct the experiments on 7 synthetic and 5 real-world benchmark datasets and compete with the state-of-the-arts. The experiments demonstrate the applicability and flexibility of the SCDA methods in stream data mining by comparing three aspects: predictive performance, memory usage and scalability of the ball. Among them, the SCDA-III method performs best in all these aspects.



中文翻译:


用于流数据挖掘的可扩展概念漂移适应



流数据挖掘旨在处理连续不断生成的数据流(例如天气、股票和交通数据),随着时间的推移,这些数据流经常会遇到概念漂移。传统的离线算法很难从实时数据中学习,而在线算法更适合挖掘具有动态概念的流数据。在在线学习算法系列中,单通道因其一次处理一个样本点且最多仅检查一次的效率而脱颖而出。目前,存在通过将分类问题转换为最小包围球来针对流数据的单次传递而定制的在线算法。然而,这些方法主要集中于扩展球以包含新数据。过大的球可能会覆盖新概念的数据,从而导致触发模型更新过程变得困难。本文提出了一种新的流数据挖掘在线单通道框架,即可扩展概念漂移适应(SCDA),并基于该框架提出了三种不同的在线方法(SCDA-I、SCDA-II和SCDA-III)。这些方法在新的样本点到达时通过膨胀或收缩来动态调整球,从而有效地避免了球过大的问题。为了评估它们的性能,我们在 7 个合成数据集和 5 个真实世界基准数据集上进行了实验,并与最先进的数据集进行竞争。实验通过比较球的预测性能、内存使用和可扩展性三个方面,证明了SCDA方法在流数据挖掘中的适用性和灵活性。其中SCDA-III方法在所有这些方面都表现最好。

更新日期:2024-06-20
down
wechat
bug