• DocumentCode
    723695
  • Title

    Distinct Random Sampling from a Distributed Stream

  • Author

    Yung-Yu Chung ; Tirthapura, Srikanta

  • Author_Institution
    Iowa State Univ., Ames, IA, USA
  • fYear
    2015
  • fDate
    25-29 May 2015
  • Firstpage
    532
  • Lastpage
    541
  • Abstract
    We consider continuous maintenance of a random sample of distinct elements from a massive data stream, whose input elements are observed at multiple distributed sites that communicate via a central coordinator. At any point, when a query is received at the coordinator, it responds with a random sample from the set of all distinct elements observed at the different sites so far. We present the first algorithms for distinct random sampling from a distributed stream. We also present a lower bound on the expected number of messages that must be transmitted by any distributed algorithm, showing that our algorithm is message optimal to within a factor of four. We present extensions to sliding windows, and experimental results showing the performance of our algorithm on real-world data sets.
  • Keywords
    data mining; distributed algorithms; query processing; random processes; sampling methods; central coordinator; continuous maintenance; distinct elements; distinct random sampling; distributed algorithm; distributed sites; distributed stream mining; massive data stream; query; random sample; sliding windows; Aggregates; Algorithm design and analysis; Complexity theory; Distributed algorithms; Distributed databases; Monitoring; Silicon; Distinct Elements; Distinct Sampling; Distributed Stream Mining; Random Sampling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
  • Conference_Location
    Hyderabad
  • ISSN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2015.97
  • Filename
    7161541