• DocumentCode
    2183248
  • Title

    Efficient Similarity Join for Time Sequences Using Locality Sensitive Hash and Mapreduce

  • Author

    Dehua Chen ; Liangliang Zheng ; Meng Zhou ; Shoujian Yu

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Donghua Univ., Shanghai, China
  • fYear
    2013
  • fDate
    16-19 Dec. 2013
  • Firstpage
    529
  • Lastpage
    533
  • Abstract
    In this paper we study how to efficiently perform similarity join over massive time sequences in parallel using Locality Sensitive Hash and MapReduce. To solve the problem, we proposed a 4-stage approach for time sequences similarity join. Our proposed approach takes as input a set of time sequences, and output pairs of time sequences satisfying a similarity join condition. In our approach, we first map each time sequence into the frequency domain using Discrete Fourier Transform to avoid the dimension curse. Secondly, we find the candidate similar time sequence pairs using the Locality Sensitive Hash, which can ensure an efficient pair-wise similarity computation. Thirdly, we also propose solutions for removing duplicated pairs to avoid repeated computation for similarity pairs that are selected as candidate for more than once. Finally, in order to improve the performance of similarity join over massive time sequences, we use the popular MapReduce framework in each step. The experimental results show that our method is efficient and scalable.
  • Keywords
    parallel algorithms; MapReduce framework; discrete Fourier transform; distributed algorithm; frequency domain; locality sensitive hash; parallel algorithm; time sequences; Data mining; Databases; Discrete Fourier transforms; Time series analysis; Time-frequency analysis; Locality Sensitive Hash; MapReduce; Similarity Join; Time Sequence;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing and Big Data (CloudCom-Asia), 2013 International Conference on
  • Conference_Location
    Fuzhou
  • Print_ISBN
    978-1-4799-2829-3
  • Type

    conf

  • DOI
    10.1109/CLOUDCOM-ASIA.2013.58
  • Filename
    6821044