DocumentCode
2457563
Title
Efficient Graph Similarity Joins with Edit Distance Constraints
Author
Zhao, Xiang ; Xiao, Chuan ; Lin, Xuemin ; Wang, Wei
Author_Institution
Univ. of New South Wales, Sydney, NSW, Australia
fYear
2012
fDate
1-5 April 2012
Firstpage
834
Lastpage
845
Abstract
Graphs are widely used to model complicated data semantics in many applications in bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to tolerate noise arising from various sources, such as erroneous data entry, and find similarity matches. In this paper, we study the graph similarity join problem that returns pairs of graphs such that their edit distances are no larger than a threshold. Inspired by the q-gram idea for string similarity problem, our solution extracts paths from graphs as features for indexing. We establish a lower bound of common features to generate candidates. An efficient algorithm is proposed to exploit both matching and mismatching features to improve the filtering and verification on candidates. We demonstrate the proposed algorithm significantly outperforms existing approaches with extensive experiments on publicly available datasets.
Keywords
data analysis; graph theory; indexing; information filtering; query processing; data semantics; edit distance constraints; filtering techniques; graph similarity join problem; indexing; q-gram idea; string similarity problem; verification techniques; Approximation algorithms; Australia; Complexity theory; Educational institutions; Filtering; Greedy algorithms; Indexes;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2012 IEEE 28th International Conference on
Conference_Location
Washington, DC
ISSN
1063-6382
Print_ISBN
978-1-4673-0042-1
Type
conf
DOI
10.1109/ICDE.2012.91
Filename
6228137
Link To Document