DocumentCode :
390908
Title :
Estimating the number of segments in time series data using permutation tests
Author :
Vasko, Kari T. ; Toivonen, Hannu T T
Author_Institution :
Dept. of Comput. Sci., Helsinki Univ., Finland
fYear :
2002
fDate :
2002
Firstpage :
466
Lastpage :
473
Abstract :
Segmentation is a popular technique for discovering structure in time series data. We address the largely open problem of estimating the number of segments that can be reliably discovered. We introduce a novel method for the problem, called Pete. Pete is based on permutation testing. The problem is an instance of model (dimension) selection. The proposed method analyzes the possible overfit of a model to the available data rather than using a term for penalizing model complexity. In this respect the approach is more similar to cross-validation than regularization based techniques (e.g., AIC, BIC, MDL, MML). Furthermore, the method produces a p value for each increase in the number of segments. This gives the user an overview of the statistical significance of segmentations. We evaluate the performance of the proposed method using both synthetic and real time series data. The experiments show that permutation testing gives realistic results for the number of reliably identifiable segments and compares favorably with Monte Carlo cross-validation (MCCV) and commonly used BIC criteria.
Keywords :
data mining; pattern clustering; time series; Pete; cross-validation; model selection; overfit model; performance evaluation; permutation tests; segment number estimation; segmentation; statistical significance; time series data; Computer science; Data mining; Lakes; Monte Carlo methods; Multidimensional systems; Organisms; Sediments; Testing; Time measurement; Time series analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
Type :
conf
DOI :
10.1109/ICDM.2002.1183990
Filename :
1183990
Link To Document :
بازگشت