当前位置: X-MOL 学术Lobachevskii J. Math. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PaSTiLa: Scalable Parallel Algorithm for Unsupervised Labeling of Long Time Series
Lobachevskii Journal of Mathematics ( IF 0.8 ) Pub Date : 2024-07-19 , DOI: 10.1134/s1995080224600766
M. L. Zymbler , A. I. Goglachev

Abstract

Summarization aims at discovering a small set of typical subsequences (patterns) in the given long time series that represent the whole series. Further, one can implement unsupervised labeling of the given time series by assigning each subsequence a tag that corresponds to its most similar pattern. In the previous research, we developed the PSF (Parallel Snippet-Finder) algorithm for the time series summarization on GPU, where a snippet is the given-length subsequence, which is similar to many other subsequences w.r.t. the bespoke distance measure MPdist. However, PSF is limited by the demand that the snippet length be predefined by a domain expert. In this article, we introduce the novel parallel algorithm PaSTiLa (Parallel Snippet-based Time series Labeling) that discovers snippets and produces the labeling of the given time series on an HPC cluster with GPU nodes. As opposed to its predecessor, PaSTiLa employs the automatic selection of the snippet length from the specified range through our proposed heuristic criterion. In the experiments on labeling quality over time series from the TSSB (Time Series Segmentation Benchmark) dataset, PaSTiLa outperforms state-of-the-art segmentation-based competitors in average \(\textrm{F}_{1}\) score. In the case of long-length time series (typically more than 8–10 K points), PaSTiLa outruns the rivals. Finally, over the million-length time series, our algorithm demonstrates a close-to-linear speedup.



中文翻译:


PaSTiLa:用于长时间序列无监督标记的可扩展并行算法


 抽象的


总结的目的是在给定的长时间序列中发现代表整个序列的一小组典型子序列(模式)。此外,可以通过为每个子序列分配与其最相似模式相对应的标签来实现给定时间序列的无监督标记。在之前的研究中,我们开发了用于 GPU 上时间序列汇总的 PSF(Parallel Snippet-Finder)算法,其中片段是给定长度的子序列,这与许多其他子序列类似。定制距离测量 MPdist。然而,PSF 受到领域专家预定义片段长度的要求的限制。在本文中,我们介绍了新颖的并行算法 PaSTiLa(基于并行片段的时间序列标签),该算法在具有 GPU 节点的 HPC 集群上发现片段并生成给定时间序列的标签。与其前身相反,PaSTiLa 通过我们提出的启发式标准从指定范围自动选择片段长度。在 TSSB(时间序列分割基准)数据集的时间序列标记质量实验中,PaSTiLa 在平均 \(\textrm{F}_{1}\) 分数上优于基于最先进分割的竞争对手。在较长时间序列(通常超过 8-10 K 点)的情况下,PaSTiLa 超越了竞争对手。最后,在百万长度的时间序列上,我们的算法表现出接近线性的加速。

更新日期:2024-07-20
down
wechat
bug