Abstract :
Time series motifs are repeated patterns in long and noisy time series. Motifs are typically used to understand the dynamics of the source because repeated patterns with high similarity evidentially rule out the presence of noise. Recently, time series motifs have also been used for clustering, summarization, rule discovery and compression as features. For all such purposes, many high quality motifs of various lengths are desirable and thus, originates the problem of enumerating motifs for a wide range of lengths. Existing algorithms find motifs for a given length. A trivial way to enumerate motifs is to run one of the algorithms for the whole range of lengths. However, such parameter sweep is computationally infeasible for large real datasets. In this paper, we describe an exact algorithm, called MOEN, to enumerate motifs. The algorithm is an order of magnitude faster than the naive algorithm. The algorithm frees us from re-discovering the same motif at different lengths and tuning multiple data-dependent parameters. The speedup comes from using a novel bound on the similarity function across lengths and the algorithm uses only linear space unlike other motif discovery algorithms. We describe three case studies in entomology and activity recognition where MOEN enumerates several high quality motifs.
Keywords :
data mining; pattern clustering; time series; MOEN exact algorithm; activity recognition; entomology; high quality motifs; large real datasets; linear space; motif discovery algorithms; multiple data-dependent parameter tuning; naive algorithm; parameter sweep; pattern clustering; repeated patterns; rule discovery; similarity function; time series motif enumeration; Algorithm design and analysis; Clustering algorithms; Electroencephalography; Force; Noise measurement; Time series analysis; Upper bound; Distance bound; Enumeration; Time series motif;