Title :
Efficiently Finding Top-K Items from Evolving Distributed Data Streams
Author :
Baoyuan Qi ; Gang Ma ; Zhongzhi Shi ; Wei Wang
Author_Institution :
Key Lab. of Intell. Inf. Process., Inst. of Comput. Technol., Beijing, China
Abstract :
The problem of efficiently finding top-k frequent items has attracted much attention in recent years. Storage constraints in the processing node and intrinsic evolving feature of the data streams are two main challenges. In this paper, we propose a method to tackle these two challenges based on space-saving and gossip-based algorithms respectively. Our method is implemented on SAMOA, a scalable advanced massive online analysis machine learning framework. The experimental results show its effectiveness and scalability.
Keywords :
data mining; learning (artificial intelligence); SAMOA framework; evolving distributed data streams; gossip-based algorithm; scalable advanced massive online analysis machine learning framework; space-saving algorithm; top-k frequent items; Data mining; Distributed databases; Machine learning algorithms; Monitoring; Peer-to-peer computing; Protocols; Radiation detectors;
Conference_Titel :
Semantics, Knowledge and Grids (SKG), 2014 10th International Conference on
Conference_Location :
Beijing
DOI :
10.1109/SKG.2014.18