Title :
Efficient Similarity Join for Time Sequences Using Locality Sensitive Hash and Mapreduce
Author :
Dehua Chen ; Liangliang Zheng ; Meng Zhou ; Shoujian Yu
Author_Institution :
Sch. of Comput. Sci. & Technol., Donghua Univ., Shanghai, China
Abstract :
In this paper we study how to efficiently perform similarity join over massive time sequences in parallel using Locality Sensitive Hash and MapReduce. To solve the problem, we proposed a 4-stage approach for time sequences similarity join. Our proposed approach takes as input a set of time sequences, and output pairs of time sequences satisfying a similarity join condition. In our approach, we first map each time sequence into the frequency domain using Discrete Fourier Transform to avoid the dimension curse. Secondly, we find the candidate similar time sequence pairs using the Locality Sensitive Hash, which can ensure an efficient pair-wise similarity computation. Thirdly, we also propose solutions for removing duplicated pairs to avoid repeated computation for similarity pairs that are selected as candidate for more than once. Finally, in order to improve the performance of similarity join over massive time sequences, we use the popular MapReduce framework in each step. The experimental results show that our method is efficient and scalable.
Keywords :
parallel algorithms; MapReduce framework; discrete Fourier transform; distributed algorithm; frequency domain; locality sensitive hash; parallel algorithm; time sequences; Data mining; Databases; Discrete Fourier transforms; Time series analysis; Time-frequency analysis; Locality Sensitive Hash; MapReduce; Similarity Join; Time Sequence;
Conference_Titel :
Cloud Computing and Big Data (CloudCom-Asia), 2013 International Conference on
Conference_Location :
Fuzhou
Print_ISBN :
978-1-4799-2829-3
DOI :
10.1109/CLOUDCOM-ASIA.2013.58