• DocumentCode
    1791635
  • Title

    Lightweight approximate top-k for distributed settings

  • Author

    Deolalikar, Vinay ; Eshghi, Kave

  • Author_Institution
    Hewlett Packard Res., Sunnyvale, CA, USA
  • fYear
    2014
  • fDate
    27-30 Oct. 2014
  • Firstpage
    835
  • Lastpage
    844
  • Abstract
    Consider the problem of finding the Top-k records in a relation based on the sum of their attributes. This problem occurs in various settings in big data management, for example in geographically distributed data centers and clouds, both at the application layer and the storage management layer. We propose a lightweight distributed, order and duplication insensitive approach based on order statistics. The salient feature of our algorithm that makes it extremely lightweight is that it only processes and communicates the items most likely to be in the Top-k. We validate the efficacy of our algorithm on a wide range of datasets.
  • Keywords
    Big Data; computer centres; distributed databases; statistical analysis; storage management; big data management; geographically distributed data centers; lightweight approximation; storage management layer; Approximation algorithms; Big data; Distributed databases; Exponential distribution; Google; Merging; Random variables;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (Big Data), 2014 IEEE International Conference on
  • Conference_Location
    Washington, DC
  • Type

    conf

  • DOI
    10.1109/BigData.2014.7004313
  • Filename
    7004313