• DocumentCode
    390908
  • Title

    Estimating the number of segments in time series data using permutation tests

  • Author

    Vasko, Kari T. ; Toivonen, Hannu T T

  • Author_Institution
    Dept. of Comput. Sci., Helsinki Univ., Finland
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    466
  • Lastpage
    473
  • Abstract
    Segmentation is a popular technique for discovering structure in time series data. We address the largely open problem of estimating the number of segments that can be reliably discovered. We introduce a novel method for the problem, called Pete. Pete is based on permutation testing. The problem is an instance of model (dimension) selection. The proposed method analyzes the possible overfit of a model to the available data rather than using a term for penalizing model complexity. In this respect the approach is more similar to cross-validation than regularization based techniques (e.g., AIC, BIC, MDL, MML). Furthermore, the method produces a p value for each increase in the number of segments. This gives the user an overview of the statistical significance of segmentations. We evaluate the performance of the proposed method using both synthetic and real time series data. The experiments show that permutation testing gives realistic results for the number of reliably identifiable segments and compares favorably with Monte Carlo cross-validation (MCCV) and commonly used BIC criteria.
  • Keywords
    data mining; pattern clustering; time series; Pete; cross-validation; model selection; overfit model; performance evaluation; permutation tests; segment number estimation; segmentation; statistical significance; time series data; Computer science; Data mining; Lakes; Monte Carlo methods; Multidimensional systems; Organisms; Sediments; Testing; Time measurement; Time series analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
  • Print_ISBN
    0-7695-1754-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2002.1183990
  • Filename
    1183990