DocumentCode
723695
Title
Distinct Random Sampling from a Distributed Stream
Author
Yung-Yu Chung ; Tirthapura, Srikanta
Author_Institution
Iowa State Univ., Ames, IA, USA
fYear
2015
fDate
25-29 May 2015
Firstpage
532
Lastpage
541
Abstract
We consider continuous maintenance of a random sample of distinct elements from a massive data stream, whose input elements are observed at multiple distributed sites that communicate via a central coordinator. At any point, when a query is received at the coordinator, it responds with a random sample from the set of all distinct elements observed at the different sites so far. We present the first algorithms for distinct random sampling from a distributed stream. We also present a lower bound on the expected number of messages that must be transmitted by any distributed algorithm, showing that our algorithm is message optimal to within a factor of four. We present extensions to sliding windows, and experimental results showing the performance of our algorithm on real-world data sets.
Keywords
data mining; distributed algorithms; query processing; random processes; sampling methods; central coordinator; continuous maintenance; distinct elements; distinct random sampling; distributed algorithm; distributed sites; distributed stream mining; massive data stream; query; random sample; sliding windows; Aggregates; Algorithm design and analysis; Complexity theory; Distributed algorithms; Distributed databases; Monitoring; Silicon; Distinct Elements; Distinct Sampling; Distributed Stream Mining; Random Sampling;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location
Hyderabad
ISSN
1530-2075
Type
conf
DOI
10.1109/IPDPS.2015.97
Filename
7161541
Link To Document