DocumentCode
610380
Title
Similarity query processing for probabilistic sets
Author
Ming Gao ; Cheqing Jin ; Wei Wang ; Xuemin Lin ; Aoying Zhou
Author_Institution
Shanghai Key Lab. on trustworthy Comput., East China Normal Univ., Shanghai, China
fYear
2013
fDate
8-12 April 2013
Firstpage
913
Lastpage
924
Abstract
Evaluating similarity between sets is a fundamental task in computer science. However, there are many applications in which elements in a set may be uncertain due to various reasons. Existing work on modeling such probabilistic sets and computing their similarities suffers from huge model sizes or significant similarity evaluation cost, and hence is only applicable to small probabilistic sets. In this paper, we propose a simple yet expressive model that supports many applications where one probabilistic set may have thousands of elements. We define two types of similarities between two probabilistic sets using the possible world semantics; they complement each other in capturing the similarity distributions in the cross product of possible worlds. We design efficient dynamic programming-based algorithms to calculate both types of similarities. Novel individual and batch pruning techniques based on upper bounding the similarity values are also proposed. To accommodate extremely large probabilistic sets, we also design sampling-based approximate query processing methods with strong probabilistic guarantees. We have conducted extensive experiments using both synthetic and real datasets, and demonstrated the effectiveness and efficiency of our proposed methods.
Keywords
data integration; dynamic programming; query processing; sampling methods; dynamic programming; probabilistic sets; pruning technique; sampling-based approximate query processing; similarity query processing; Approximation algorithms; Computational modeling; Heuristic algorithms; Probabilistic logic; Query processing; Semantics; Upper bound;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2013 IEEE 29th International Conference on
Conference_Location
Brisbane, QLD
ISSN
1063-6382
Print_ISBN
978-1-4673-4909-3
Electronic_ISBN
1063-6382
Type
conf
DOI
10.1109/ICDE.2013.6544885
Filename
6544885
Link To Document