DocumentCode :
254817
Title :
On the Organization of Cluster Voting with Massive Distributed Streams
Author :
Alhudhaif, Adi ; Tong Yan ; Berkovich, Simon
Author_Institution :
Dept. of Comput. Sci., George Washington Univ., Washington, DC, USA
fYear :
2014
fDate :
4-6 Aug. 2014
Firstpage :
55
Lastpage :
62
Abstract :
Data processing is one of the important challenges on Big Data. In this paper we investigate optimal processing algorithm for massive data streams, propose a new processing algorithm called multi-buffer based majority algorithm. The algorithm maintains time complexity of O(n) and selects prevalent elements of frequencies as low as 1%. Our experiments indicate that multi-buffer based majority algorithm has improvements on both accuracy and efficiency. Moreover, we use multibuffer based algorithm to process data streams on single system and distributed system. These experiments indicate that using multi-buffer based algorithm can have better performance on distributed system. Moreover, we give explanations of the experiments´ result and indicate several major factors which influence the result accuracy: stream size, element range in the stream, frequency of predominant elements and our buffer sets.
Keywords :
Big Data; computational complexity; distributed processing; Big Data; cluster voting; data processing; massive data streams; massive distributed streams; time complexity; Accuracy; Algorithm design and analysis; Approximation algorithms; Big data; Clustering algorithms; Radiation detectors; Time complexity; big data clusterization; cloud computing; majority algorithm; stream processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing for Geospatial Research and Application (COM.Geo), 2014 Fifth International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/COM.Geo.2014.3
Filename :
6910121
Link To Document :
بازگشت