• DocumentCode
    3576302
  • Title

    A Sampling Method of Finding Top-k Frequent Items on Timestamp-Based Stream

  • Author

    Wenfeng Li ; Liwei Wang ; Zhiyong Peng ; Deyi Li

  • Author_Institution
    State Key Lab. of Software Eng., Wuhan Univ., Wuhan, China
  • fYear
    2014
  • Firstpage
    221
  • Lastpage
    226
  • Abstract
    Data streams with high volume and complicated items become more and more common, and typical algorithms of finding top-k frequent items on streams, such as counter-based algorithms and sketch algorithms, are gradually not keeping up with efficiency requirements. Our paper focuses on finding top-k frequent items on timestamp-based complicated streams, and proposes an approximate solution by sampling. Specifically, we design a multi-treap parallel priority algorithm to maintain uniform sample on timestamp-based sliding windows. The top-k answers are approximated through processing on samples. We also theoretically analyze the relationship between item accuracy and sample size. Through experimental analysis on real data, our method provides flexible sample size to satisfy different accuracy requirements and ensure a good running efficiency.
  • Keywords
    data mining; parallel algorithms; sampling methods; approximate solution; counter-based algorithm; data stream; item accuracy; multitreap parallel priority algorithm; running efficiency; sampling method; sketch algorithm; timestamp-based complicated stream; timestamp-based sliding window; timestamp-based stream; top-k answer; top-k frequent item; Accuracy; Algorithm design and analysis; Approximation algorithms; Approximation methods; Reservoirs; Sampling methods; Tin; multi Ctreap parallel priority algorithm; timestamp-based stream; top-k frequent items;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Information System and Application Conference (WISA), 2014 11th
  • Print_ISBN
    978-1-4799-5726-2
  • Type

    conf

  • DOI
    10.1109/WISA.2014.47
  • Filename
    7058016