DocumentCode :
249320
Title :
Practising Scalable Graph Similarity Joins in MapReduce
Author :
Yifan Chen ; Xiang Zhao ; Bin Ge ; Chuan Xiao ; Chi-Hung Chi
Author_Institution :
Sci. & Technol. on Inf. Syst. & Eng. Lab., Nat. Univ. of Defense Technol., Changsha, China
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
112
Lastpage :
119
Abstract :
Along with the emergence of massive graph-modeled data, it is of great importance to investigate graph similarity join due to its wide applications for multiple purposes, including data cleaning, near duplicate detection, etc. This paper considers graph similarity joins with edit distance constraints, which return pairs of graphs such that their edit distances are no larger than a given threshold. Leveraging the MapReduce programming model, we propose MGSJoin, a scalable algorithm following the filtering-verification framework for efficient graph similarity joins. It relies on counting overlapping graph signatures for filtering out non-promising candidates. With the potential issue of too many key-value pairs in the filtering phase, spectral Bloom filters are introduced to reduce the number of key-value pairs. Furthermore, we integrate the multiway join strategy to boost the verification. The superior efficiency and scalability of the proposed algorithms are demonstrated by extensive experimental results.
Keywords :
data analysis; data structures; information filtering; MGSJoin; MapReduce programming model; edit distance constraints; filtering-verification framework; multiway join strategy; overlapping graph signatures; scalable graph similarity joins; spectral Bloom filters; Abstracts; Big data; Complexity theory; Educational institutions; Filtering; Laboratories; Bloom filter; Graph similarity join; MapReduce; Multiway join;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2014 IEEE International Congress on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5056-0
Type :
conf
DOI :
10.1109/BigData.Congress.2014.25
Filename :
6906768
Link To Document :
بازگشت