DocumentCode :
3121804
Title :
Semantics of Ranking Queries for Probabilistic Data and Expected Ranks
Author :
Cormode, Graham ; Li, Feifei ; Yi, Ke
Author_Institution :
AT&T Labs. Res., Florham Park, NJ
fYear :
2009
fDate :
March 29 2009-April 2 2009
Firstpage :
305
Lastpage :
316
Abstract :
When dealing with massive quantities of data, top-k queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditional database settings. The importance of the top-k is perhaps even greater in probabilistic databases, where a relation can encode exponentially many possible worlds. There have been several recent attempts to propose definitions and algorithms for ranking queries over probabilistic data. However, these all lack many of the intuitive properties of a top-k over deterministic data. Specifically, we define a number of fundamental properties, including exact-k, containment, unique-rank, value-invariance, and stability, which are all satisfied by ranking queries on certain data. We argue that all these conditions should also be fulfilled by any reasonable definition for ranking uncertain data. Unfortunately, none of the existing definitions is able to achieve this. To remedy this shortcoming, this work proposes an intuitive new approach of expected rank. This uses the well-founded notion of the expected rank of each tuple across all possible worlds as the basis of the ranking. We are able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query. We provide efficient solutions to compute this ranking across the major models of uncertain data, such as attribute-level and tuple-level uncertainty. For an uncertain relation of N tuples, the processing cost is O(N log N)-no worse than simply sorting the relation. In settings where there is a high cost for generating each tuple in turn, we provide pruning techniques based on probabilistic tail bounds that can terminate the search early and guarantee that the top-k has been found. Finally, a comprehensive experimental study confirms the effectiveness of our approach.
Keywords :
query processing; attribute-level uncertainty; expected ranks; probabilistic data; probabilistic databases; probabilistic tail bounds; pruning techniques; ranking queries; top-k queries; tuple-level uncertainty; Computer science; Costs; Data engineering; Inspection; Multimedia databases; Power engineering and energy; Relational databases; Streaming media; USA Councils; Uncertainty;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
Conference_Location :
Shanghai
ISSN :
1084-4627
Print_ISBN :
978-1-4244-3422-0
Electronic_ISBN :
1084-4627
Type :
conf
DOI :
10.1109/ICDE.2009.75
Filename :
4812412
Link To Document :
بازگشت