• DocumentCode
    109887
  • Title

    The discovery of burst topic and its intermittent evolution in our real world

  • Author

    Siliang Tang ; Yin Zhang ; Hanqi Wang ; Ming Chen ; Fei Wu ; Yueting Zhuang

  • Author_Institution
    Inst. of Artificial Intell., Zhejiang Univ., Hangzhou, China
  • Volume
    10
  • Issue
    3
  • fYear
    2013
  • fDate
    Mar-13
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    Nowadays, a considerably large number of documents are available over many online news sites (e.g., CNN and NYT). Therefore, the utilization of these online documents, for example, the discovery of a burst topic and its evolution, is a significant challenge. In this paper, a novel topic model, called intermittent Evolution LDA (iELDA) is proposed. In iELDA, the time-evolving documents are divided into many small epochs. iELDA utilizes the detected global topics as priors to guide the detection of an emerging topic and keep track of its evolution over different epochs. As a natural extension of the traditional Latent Dirichlet Allocation (LDA) and Dynamic Topic Model (DTM), iELDA has an advantage: it can discover the intermittent recurring pattern of a burst topic. We apply iELDA to real-world data from NYT; the results demonstrate that the proposed iELDA can appropriately capture a burst topic and track its intermittent evolution as well as produce a better predictive ability than other related topic models.
  • Keywords
    document handling; CNN; NYT; burst topic discovery; dynamic topic model; epochs; global topics detection; iELDA; intermittent evolution LDA; intermittent recurring pattern; latent Dirichlet allocation; natural extension; online documents; online news sites; predictive ability; real-world data; time-evolving documents; Cluster approximation; Context modeling; Data models; Iterative methods; Predictive models; Resource management; LDA; iterative clustering model; time series;
  • fLanguage
    English
  • Journal_Title
    Communications, China
  • Publisher
    ieee
  • ISSN
    1673-5447
  • Type

    jour

  • DOI
    10.1109/CC.2013.6488826
  • Filename
    6488826