DocumentCode :
723695
Title :
Distinct Random Sampling from a Distributed Stream
Author :
Yung-Yu Chung ; Tirthapura, Srikanta
Author_Institution :
Iowa State Univ., Ames, IA, USA
fYear :
2015
fDate :
25-29 May 2015
Firstpage :
532
Lastpage :
541
Abstract :
We consider continuous maintenance of a random sample of distinct elements from a massive data stream, whose input elements are observed at multiple distributed sites that communicate via a central coordinator. At any point, when a query is received at the coordinator, it responds with a random sample from the set of all distinct elements observed at the different sites so far. We present the first algorithms for distinct random sampling from a distributed stream. We also present a lower bound on the expected number of messages that must be transmitted by any distributed algorithm, showing that our algorithm is message optimal to within a factor of four. We present extensions to sliding windows, and experimental results showing the performance of our algorithm on real-world data sets.
Keywords :
data mining; distributed algorithms; query processing; random processes; sampling methods; central coordinator; continuous maintenance; distinct elements; distinct random sampling; distributed algorithm; distributed sites; distributed stream mining; massive data stream; query; random sample; sliding windows; Aggregates; Algorithm design and analysis; Complexity theory; Distributed algorithms; Distributed databases; Monitoring; Silicon; Distinct Elements; Distinct Sampling; Distributed Stream Mining; Random Sampling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location :
Hyderabad
ISSN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2015.97
Filename :
7161541
Link To Document :
بازگشت