Title :
Lightweight approximate top-k for distributed settings
Author :
Deolalikar, Vinay ; Eshghi, Kave
Author_Institution :
Hewlett Packard Res., Sunnyvale, CA, USA
Abstract :
Consider the problem of finding the Top-k records in a relation based on the sum of their attributes. This problem occurs in various settings in big data management, for example in geographically distributed data centers and clouds, both at the application layer and the storage management layer. We propose a lightweight distributed, order and duplication insensitive approach based on order statistics. The salient feature of our algorithm that makes it extremely lightweight is that it only processes and communicates the items most likely to be in the Top-k. We validate the efficacy of our algorithm on a wide range of datasets.
Keywords :
Big Data; computer centres; distributed databases; statistical analysis; storage management; big data management; geographically distributed data centers; lightweight approximation; storage management layer; Approximation algorithms; Big data; Distributed databases; Exponential distribution; Google; Merging; Random variables;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004313