DocumentCode
1791635
Title
Lightweight approximate top-k for distributed settings
Author
Deolalikar, Vinay ; Eshghi, Kave
Author_Institution
Hewlett Packard Res., Sunnyvale, CA, USA
fYear
2014
fDate
27-30 Oct. 2014
Firstpage
835
Lastpage
844
Abstract
Consider the problem of finding the Top-k records in a relation based on the sum of their attributes. This problem occurs in various settings in big data management, for example in geographically distributed data centers and clouds, both at the application layer and the storage management layer. We propose a lightweight distributed, order and duplication insensitive approach based on order statistics. The salient feature of our algorithm that makes it extremely lightweight is that it only processes and communicates the items most likely to be in the Top-k. We validate the efficacy of our algorithm on a wide range of datasets.
Keywords
Big Data; computer centres; distributed databases; statistical analysis; storage management; big data management; geographically distributed data centers; lightweight approximation; storage management layer; Approximation algorithms; Big data; Distributed databases; Exponential distribution; Google; Merging; Random variables;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location
Washington, DC
Type
conf
DOI
10.1109/BigData.2014.7004313
Filename
7004313
Link To Document