Lightweight approximate top-k for distributed settings

Author

Deolalikar, Vinay ; Eshghi, Kave

Author_Institution

Hewlett Packard Res., Sunnyvale, CA, USA

fYear

2014

fDate

27-30 Oct. 2014

Firstpage

835

Lastpage

844

Abstract

Consider the problem of finding the Top-k records in a relation based on the sum of their attributes. This problem occurs in various settings in big data management, for example in geographically distributed data centers and clouds, both at the application layer and the storage management layer. We propose a lightweight distributed, order and duplication insensitive approach based on order statistics. The salient feature of our algorithm that makes it extremely lightweight is that it only processes and communicates the items most likely to be in the Top-k. We validate the efficacy of our algorithm on a wide range of datasets.

Keywords

Big Data; computer centres; distributed databases; statistical analysis; storage management; big data management; geographically distributed data centers; lightweight approximation; storage management layer; Approximation algorithms; Big data; Distributed databases; Exponential distribution; Google; Merging; Random variables;

fLanguage

English

Publisher

ieee

Conference_Titel

Big Data (Big Data), 2014 IEEE International Conference on

Conference_Location

Washington, DC

Type

conf

DOI

10.1109/BigData.2014.7004313

Filename

7004313

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=1791635