Title :
A new parallelization model for detecting temporal bursts in large-scale document streams on a multi-core CPU
Author :
Tamura, Keiichi ; Kitakami, Hajime
Author_Institution :
Grad. Sch. of Inf. Sci., Hiroshima City Univ., Hiroshima, Japan
Abstract :
Burstiness is the simplest but the most robust criterion for detecting topics and events in online documents. Online documents are referred to as document streams because they have a temporal order. Kleinberg´s temporal burst detection algorithm is the most successful algorithm for detecting bursty periods related to a topic- or event-related keyword. Kleinberg´s temporal burst detection algorithm aims to find certain time periods in which a keyword occurs at a high frequency. In recent times, large-scale online documents are increasingly common on social media. Therefore, speed-up of burst-detection processing is one of the most important issues in this era of big data. In this paper, we propose a novel parallelization model, called the hybrid parallelization model with a hidden I/O thread, to enable the parallel processing of Kleinberg´s temporal burst detection algorithm on a multi-core CPU. In a multi-core CPU environment, I/O latency is a critical issue for improving the performance of a parallelization model. To automatically hide the I/O latency, the proposed parallelization model utilizes speculative I/Os. The results of experiments using actual large-scale document streams show that the proposed parallelization model performs well compared with a conventional parallelization model.
Keywords :
Big Data; information retrieval; multiprocessing systems; parallel processing; social networking (online); I/O latency; Kleinberg´s temporal burst detection algorithm; big data; burst-detection processing; event-related keyword; hidden I/O thread; hybrid parallelization model; large-scale document streams; multicore CPU environment; online document topic detection; parallel processing; parallelization model; social media; temporal burst detection; topic-related keyword; Data models; Detection algorithms; Instruction sets; Media; Message systems; Parallel processing; Viterbi algorithm; Burst detection; Document stream; Multi-core CPU; Parallel processing;
Conference_Titel :
Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on
Conference_Location :
San Diego, CA
DOI :
10.1109/SMC.2014.6973960