Title of article :
Finding top-k elements in data streams
Author/Authors :
Nuno Homem، نويسنده , , Joao Paulo Carvalho، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2010
Pages :
17
From page :
4958
To page :
4974
Abstract :
Identifying the most frequent elements in a data stream is a well known and difficult problem. Identifying the most frequent elements for each individual, especially in very large populations, is even harder. The use of fast and small memory footprint algorithms is paramount when the number of individuals is very large. In many situations such analysis needs to be performed and kept up to date in near real time. Fortunately, approximate answers are usually adequate when dealing with this problem. This paper presents a new and innovative algorithm that addresses this problem by merging the commonly used counter-based and sketch-based techniques for top-k identification. The algorithm provides the top-k list of elements, their frequency and an error estimate for each frequency value. It also provides strong guarantees on the error estimate, order of elements and inclusion of elements in the list depending on their real frequency. Additionally the algorithm provides stochastic bounds on the error and expected error estimates. Telecommunications customer’s behavior and voice call data is used to present concrete results obtained with this algorithm and to illustrate improvements over previously existing algorithms.
Keywords :
Most frequent , Top-k algorithms , Data stream frequencies , Estimation , Approximate algorithms
Journal title :
Information Sciences
Serial Year :
2010
Journal title :
Information Sciences
Record number :
1214160
Link To Document :
بازگشت