DocumentCode :
3740363
Title :
A performance study of the chain sampling algorithm
Author :
Rayane El Sibai;Yousra Chabchoub;Jacques Demerjian;Zakia Kazi-Aoul;Kabalan Barbar
Author_Institution :
LISITE Laboratory, ISEP, Paris, France
fYear :
2015
Firstpage :
487
Lastpage :
494
Abstract :
On-line data stream analysis is an important challenge today because of the always-increasing rates of the streams issued from multiple heterogeneous sources, in many application domains. To reduce the amount of the data stream, several sampling methods were designed by the data stream research community. We focus in this paper, on the chain sampling algorithm proposed by Babcock et al. The aim of this algorithm is to select randomly and at any time, a given fixed proportion from the most recent items of the stream contained in the last sliding window. This algorithm is well adapted to the stream context, as only one pass over the data is performed. Moreover it uses a small memory, as it does not store all the items of the current sliding window. We show in this paper that the chain sampling algorithm suffers from some collision or redundancy problems. The collision occurs when the same item is selected as a sample more than once during the execution of the algorithm. We propose two approaches to overcome this weakness and improve the chain sampling algorithm. The first one is called “inverting the selection for a high sampling rate” and the second one is inspired from the “divide to conquer strategy”. Different experimentations are performed to show the efficiency of these two improvements, in particular their impact on the execution time of the algorithm.
Keywords :
"Memory management","Chlorine"
Publisher :
ieee
Conference_Titel :
Intelligent Computing and Information Systems (ICICIS), 2015 IEEE Seventh International Conference on
Print_ISBN :
978-1-5090-1949-6
Type :
conf
DOI :
10.1109/IntelCIS.2015.7397265
Filename :
7397265
Link To Document :
بازگشت