• DocumentCode
    173215
  • Title

    A new parallelization model for detecting temporal bursts in large-scale document streams on a multi-core CPU

  • Author

    Tamura, Keiichi ; Kitakami, Hajime

  • Author_Institution
    Grad. Sch. of Inf. Sci., Hiroshima City Univ., Hiroshima, Japan
  • fYear
    2014
  • fDate
    5-8 Oct. 2014
  • Firstpage
    519
  • Lastpage
    524
  • Abstract
    Burstiness is the simplest but the most robust criterion for detecting topics and events in online documents. Online documents are referred to as document streams because they have a temporal order. Kleinberg´s temporal burst detection algorithm is the most successful algorithm for detecting bursty periods related to a topic- or event-related keyword. Kleinberg´s temporal burst detection algorithm aims to find certain time periods in which a keyword occurs at a high frequency. In recent times, large-scale online documents are increasingly common on social media. Therefore, speed-up of burst-detection processing is one of the most important issues in this era of big data. In this paper, we propose a novel parallelization model, called the hybrid parallelization model with a hidden I/O thread, to enable the parallel processing of Kleinberg´s temporal burst detection algorithm on a multi-core CPU. In a multi-core CPU environment, I/O latency is a critical issue for improving the performance of a parallelization model. To automatically hide the I/O latency, the proposed parallelization model utilizes speculative I/Os. The results of experiments using actual large-scale document streams show that the proposed parallelization model performs well compared with a conventional parallelization model.
  • Keywords
    Big Data; information retrieval; multiprocessing systems; parallel processing; social networking (online); I/O latency; Kleinberg´s temporal burst detection algorithm; big data; burst-detection processing; event-related keyword; hidden I/O thread; hybrid parallelization model; large-scale document streams; multicore CPU environment; online document topic detection; parallel processing; parallelization model; social media; temporal burst detection; topic-related keyword; Data models; Detection algorithms; Instruction sets; Media; Message systems; Parallel processing; Viterbi algorithm; Burst detection; Document stream; Multi-core CPU; Parallel processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on
  • Conference_Location
    San Diego, CA
  • Type

    conf

  • DOI
    10.1109/SMC.2014.6973960
  • Filename
    6973960