DocumentCode :
3576302
Title :
A Sampling Method of Finding Top-k Frequent Items on Timestamp-Based Stream
Author :
Wenfeng Li ; Liwei Wang ; Zhiyong Peng ; Deyi Li
Author_Institution :
State Key Lab. of Software Eng., Wuhan Univ., Wuhan, China
fYear :
2014
Firstpage :
221
Lastpage :
226
Abstract :
Data streams with high volume and complicated items become more and more common, and typical algorithms of finding top-k frequent items on streams, such as counter-based algorithms and sketch algorithms, are gradually not keeping up with efficiency requirements. Our paper focuses on finding top-k frequent items on timestamp-based complicated streams, and proposes an approximate solution by sampling. Specifically, we design a multi-treap parallel priority algorithm to maintain uniform sample on timestamp-based sliding windows. The top-k answers are approximated through processing on samples. We also theoretically analyze the relationship between item accuracy and sample size. Through experimental analysis on real data, our method provides flexible sample size to satisfy different accuracy requirements and ensure a good running efficiency.
Keywords :
data mining; parallel algorithms; sampling methods; approximate solution; counter-based algorithm; data stream; item accuracy; multitreap parallel priority algorithm; running efficiency; sampling method; sketch algorithm; timestamp-based complicated stream; timestamp-based sliding window; timestamp-based stream; top-k answer; top-k frequent item; Accuracy; Algorithm design and analysis; Approximation algorithms; Approximation methods; Reservoirs; Sampling methods; Tin; multi Ctreap parallel priority algorithm; timestamp-based stream; top-k frequent items;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information System and Application Conference (WISA), 2014 11th
Print_ISBN :
978-1-4799-5726-2
Type :
conf
DOI :
10.1109/WISA.2014.47
Filename :
7058016
Link To Document :
بازگشت