Title :
Estimating the number of segments in time series data using permutation tests
Author :
Vasko, Kari T. ; Toivonen, Hannu T T
Author_Institution :
Dept. of Comput. Sci., Helsinki Univ., Finland
Abstract :
Segmentation is a popular technique for discovering structure in time series data. We address the largely open problem of estimating the number of segments that can be reliably discovered. We introduce a novel method for the problem, called Pete. Pete is based on permutation testing. The problem is an instance of model (dimension) selection. The proposed method analyzes the possible overfit of a model to the available data rather than using a term for penalizing model complexity. In this respect the approach is more similar to cross-validation than regularization based techniques (e.g., AIC, BIC, MDL, MML). Furthermore, the method produces a p value for each increase in the number of segments. This gives the user an overview of the statistical significance of segmentations. We evaluate the performance of the proposed method using both synthetic and real time series data. The experiments show that permutation testing gives realistic results for the number of reliably identifiable segments and compares favorably with Monte Carlo cross-validation (MCCV) and commonly used BIC criteria.
Keywords :
data mining; pattern clustering; time series; Pete; cross-validation; model selection; overfit model; performance evaluation; permutation tests; segment number estimation; segmentation; statistical significance; time series data; Computer science; Data mining; Lakes; Monte Carlo methods; Multidimensional systems; Organisms; Sediments; Testing; Time measurement; Time series analysis;
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
DOI :
10.1109/ICDM.2002.1183990