Title :
Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data
Author :
Rakthanmanon, Thanawin ; Keogh, Eamonn J. ; Lonardi, Stefano ; Evans, Scott
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of California, Riverside, CA, USA
Abstract :
Given the pervasiveness of time series data in all human endeavors, and the ubiquity of clustering as a data mining application, it is somewhat surprising that the problem of time series clustering from a single stream remains largely unsolved. Most work on time series clustering considers the clustering of individual time series, e.g., gene expression profiles, individual heartbeats or individual gait cycles. The few attempts at clustering time series streams have been shown to be objectively incorrect in some cases, and in other cases shown to work only on the most contrived datasets by carefully adjusting a large set of parameters. In this work, we make two fundamental contributions. First, we show that the problem definition for time series clustering from streams currently used is inherently flawed, and a new definition is necessary. Second, we show that the Minimum Description Length (MDL) framework offers an efficient, effective and essentially parameter-free method for time series clustering. We show that our method produces objectively correct results on a wide variety of datasets from medicine, zoology and industrial process analyses.
Keywords :
data mining; pattern clustering; time series; data mining application; gene expression profiles; human endeavors; industrial process analyses; medicine; minimum description length framework; parameter free method; time series epenthesis; time series streams clustering; zoology; Clustering algorithms; Data mining; Encoding; Entropy; Euclidean distance; Handicapped aids; Time series analysis; MDL; clustering; time series;
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
Print_ISBN :
978-1-4577-2075-8
DOI :
10.1109/ICDM.2011.146