DocumentCode
173215
Title
A new parallelization model for detecting temporal bursts in large-scale document streams on a multi-core CPU
Author
Tamura, Keiichi ; Kitakami, Hajime
Author_Institution
Grad. Sch. of Inf. Sci., Hiroshima City Univ., Hiroshima, Japan
fYear
2014
fDate
5-8 Oct. 2014
Firstpage
519
Lastpage
524
Abstract
Burstiness is the simplest but the most robust criterion for detecting topics and events in online documents. Online documents are referred to as document streams because they have a temporal order. Kleinberg´s temporal burst detection algorithm is the most successful algorithm for detecting bursty periods related to a topic- or event-related keyword. Kleinberg´s temporal burst detection algorithm aims to find certain time periods in which a keyword occurs at a high frequency. In recent times, large-scale online documents are increasingly common on social media. Therefore, speed-up of burst-detection processing is one of the most important issues in this era of big data. In this paper, we propose a novel parallelization model, called the hybrid parallelization model with a hidden I/O thread, to enable the parallel processing of Kleinberg´s temporal burst detection algorithm on a multi-core CPU. In a multi-core CPU environment, I/O latency is a critical issue for improving the performance of a parallelization model. To automatically hide the I/O latency, the proposed parallelization model utilizes speculative I/Os. The results of experiments using actual large-scale document streams show that the proposed parallelization model performs well compared with a conventional parallelization model.
Keywords
Big Data; information retrieval; multiprocessing systems; parallel processing; social networking (online); I/O latency; Kleinberg´s temporal burst detection algorithm; big data; burst-detection processing; event-related keyword; hidden I/O thread; hybrid parallelization model; large-scale document streams; multicore CPU environment; online document topic detection; parallel processing; parallelization model; social media; temporal burst detection; topic-related keyword; Data models; Detection algorithms; Instruction sets; Media; Message systems; Parallel processing; Viterbi algorithm; Burst detection; Document stream; Multi-core CPU; Parallel processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on
Conference_Location
San Diego, CA
Type
conf
DOI
10.1109/SMC.2014.6973960
Filename
6973960
Link To Document