• DocumentCode
    1505476
  • Title

    Dictionary-Based Compression for Long Time-Series Similarity

  • Author

    Lang, Willis ; Morse, Michael ; Patel, Jignesh M.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Wisconsin, Madison, WI, USA
  • Volume
    22
  • Issue
    11
  • fYear
    2010
  • Firstpage
    1609
  • Lastpage
    1622
  • Abstract
    Long time-series data sets are common in many domains, especially scientific domains. Applications in these fields often require comparing trajectories using similarity measures. Existing methods perform well for short time series but their evaluation cost degrades rapidly for longer time series. In this work, we develop a new time-series similarity measure called the Dictionary Compression Score (DCS) for determining time-series similarity. We also show that this method allows us to accurately and quickly calculate similarity for both short and long time series. We use the well-known Kolmogorov Complexity in information theory and the Lempel-Ziv compression framework as a basis to calculate similarity scores. We show that off-the-shelf compressors do not fair well for computing time-series similarity. To address this problem, we developed a novel dictionary-based compression technique to compute time-series similarity. We also develop heuristics to automatically identify suitable parameters for our method, thus, removing the task of parameter tuning found in other existing methods. We have extensively compared DCS with existing similarity methods for classification. Our experimental evaluation shows that for long time-series data sets, DCS is accurate, and it is also significantly faster than existing methods.
  • Keywords
    data compression; dictionaries; information theory; time series; Kolmogorov complexity; Lempel-Ziv compression framework; dictionary compression score; dictionary-based compression technique; information theory; long time-series data sets; similarity measures; Application software; Compressors; Computer science; Costs; Data engineering; Databases; Degradation; Distributed control; Performance evaluation; Size measurement; Spatial databases and GIS; database management.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2009.201
  • Filename
    5291693