• DocumentCode
    1070862
  • Title

    Efficient Similarity Join over Multiple Stream Time Series

  • Author

    Lian, Xiang ; Chen, Lei

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Hong Kong Univ. of Sci. & Technol., Kowloon, China
  • Volume
    21
  • Issue
    11
  • fYear
    2009
  • Firstpage
    1544
  • Lastpage
    1558
  • Abstract
    Similarity join (SJ) in time-series databases has a wide spectrum of applications such as data cleaning and mining. Specifically, an SJ query retrieves all pairs of (sub)sequences from two time-series databases that epsiv-match with each other, where epsiv is the matching threshold. Previous work on this problem usually considers static time-series databases, where queries are performed either on disk-based multidimensional indexes built on static data or by nested loop join (NLJ) without indexes. SJ over multiple stream time series, which continuously outputs pairs of similar subsequences from stream time series, strongly requires low memory consumption, low processing cost, and query procedures that are themselves adaptive to time-varying stream data. These requirements invalidate the existing approaches in static databases. In this paper, we propose an efficient and effective approach to perform SJ among multiple stream time series incrementally. In particular, we present a novel method, Adaptive Radius-based Search (ARES), which can answer the similarity search without false dismissals and is seamlessly integrated into SJ processing. Most importantly, we provide a formal cost model for ARES, based on which ARES can be adaptive to data characteristics, achieving the minimum number of refined candidate pairs, and thus, suitable for stream processing. Furthermore, in light of the cost model, we utilize space-efficient synopses that are constructed for stream time series to further reduce the candidate set. Extensive experiments demonstrate the efficiency and effectiveness of our proposed approach.
  • Keywords
    database management systems; time series; adaptive radius-based search; disk-based multidimensional index; multiple stream time series; nested loop join; similarity join; similarity search; space-efficient synopsis; static data; static time-series database; time-varying stream data; ARES; Stream time series; similarity join; synopsis.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2009.27
  • Filename
    4752830