DocumentCode :
610380
Title :
Similarity query processing for probabilistic sets
Author :
Ming Gao ; Cheqing Jin ; Wei Wang ; Xuemin Lin ; Aoying Zhou
Author_Institution :
Shanghai Key Lab. on trustworthy Comput., East China Normal Univ., Shanghai, China
fYear :
2013
fDate :
8-12 April 2013
Firstpage :
913
Lastpage :
924
Abstract :
Evaluating similarity between sets is a fundamental task in computer science. However, there are many applications in which elements in a set may be uncertain due to various reasons. Existing work on modeling such probabilistic sets and computing their similarities suffers from huge model sizes or significant similarity evaluation cost, and hence is only applicable to small probabilistic sets. In this paper, we propose a simple yet expressive model that supports many applications where one probabilistic set may have thousands of elements. We define two types of similarities between two probabilistic sets using the possible world semantics; they complement each other in capturing the similarity distributions in the cross product of possible worlds. We design efficient dynamic programming-based algorithms to calculate both types of similarities. Novel individual and batch pruning techniques based on upper bounding the similarity values are also proposed. To accommodate extremely large probabilistic sets, we also design sampling-based approximate query processing methods with strong probabilistic guarantees. We have conducted extensive experiments using both synthetic and real datasets, and demonstrated the effectiveness and efficiency of our proposed methods.
Keywords :
data integration; dynamic programming; query processing; sampling methods; dynamic programming; probabilistic sets; pruning technique; sampling-based approximate query processing; similarity query processing; Approximation algorithms; Computational modeling; Heuristic algorithms; Probabilistic logic; Query processing; Semantics; Upper bound;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2013 IEEE 29th International Conference on
Conference_Location :
Brisbane, QLD
ISSN :
1063-6382
Print_ISBN :
978-1-4673-4909-3
Electronic_ISBN :
1063-6382
Type :
conf
DOI :
10.1109/ICDE.2013.6544885
Filename :
6544885
Link To Document :
بازگشت