Title :
Distance measures for effective clustering of ARIMA time-series
Author :
Kalpakis, Konstantinos ; Gada, Dhiral ; Puttagunta, Vasundhara
Author_Institution :
Dept. of Comput. Sci. & Electron. Eng., Maryland Univ., Baltimore, MD, USA
Abstract :
Much environmental and socioeconomic time-series data can be adequately modeled using autoregressive integrated moving average (ARIMA) models. We call such time series "ARIMA time series". We propose the use of the linear predictive coding (LPC) cepstrum for clustering ARIMA time series, by using the Euclidean distance between the LPC cepstra of two time series as their dissimilarity measure. We demonstrate that LPC cepstral coefficients have the desired features for accurate clustering and efficient indexing of ARIMA time series. For example, just a few LPC cepstral coefficients are sufficient in order to discriminate between time series that are modeled by different ARIMA models. In fact, this approach requires fewer coefficients than traditional approaches, such as DFT (discrete Fourier transform) and DWT (discrete wavelet transform). The proposed distance measure can be used for measuring the similarity between different ARIMA models as well. We cluster ARIMA time series using the "partition around medoids" method with various similarity measures. We present experimental results demonstrating that, using the proposed measure, we achieve significantly better clusterings of ARIMA time series data as compared to clusterings obtained by using other traditional similarity measures, such as DFT, DWT, PCA (principal component analysis), etc. Experiments were performed both on simulated and real data
Keywords :
autoregressive moving average processes; cepstral analysis; data mining; economic cybernetics; environmental factors; linear predictive coding; pattern clustering; social sciences; socio-economic effects; temporal databases; time series; ARIMA time-series clustering; Euclidean distance; LPC cepstral coefficients; autoregressive integrated moving average; dissimilarity measure; distance measure; environmental data; indexing; linear predictive coding; partition-around-medoids method; similarity measures; socioeconomic data; Cepstral analysis; Cepstrum; Discrete Fourier transforms; Discrete wavelet transforms; Euclidean distance; Fourier transforms; Indexing; Linear predictive coding; Principal component analysis; Time measurement;
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
DOI :
10.1109/ICDM.2001.989529