• DocumentCode
    1379247
  • Title

    Keep It Simple with Time: A Reexamination of Probabilistic Topic Detection Models

  • Author

    He, Qi ; Chang, Kuiyu ; Lim, Ee-Peng ; Banerjee, Arindam

  • Author_Institution
    Coll. of Inf. Sci. & Technol., Pennsylvania State Univ., State College, PA, USA
  • Volume
    32
  • Issue
    10
  • fYear
    2010
  • Firstpage
    1795
  • Lastpage
    1808
  • Abstract
    Topic detection (TD) is a fundamental research issue in the Topic Detection and Tracking (TDT) community with practical implications; TD helps analysts to separate the wheat from the chaff among the thousands of incoming news streams. In this paper, we propose a simple and effective topic detection model called the temporal Discriminative Probabilistic Model (DPM), which is shown to be theoretically equivalent to the classic vector space model with feature selection and temporally discriminative weights. We compare DPM to its various probabilistic cousins, ranging from mixture models like von-Mises Fisher (vMF) to mixed membership models like Latent Dirichlet Allocation (LDA). Benchmark results on the TDT3 data set show that sophisticated models, such as vMF and LDA, do not necessarily lead to better results; in the case of LDA, notably worst performance was obtained under variational inference, which is likely due to the significantly large number of LDA model parameters involved for document-level topic detection. On the contrary, using a relatively simple time-aware probabilistic model such as DPM suffices for both offline and online topic detection tasks, making DPM a theoretically elegant and effective model for practical topic detection.
  • Keywords
    document handling; information retrieval; probability; TDT3 data set; feature selection; latent dirichlet allocation; news streams; probabilistic topic detection models; temporal discriminative probabilistic model; topic detection and tracking; vMF; Character generation; Data mining; Event detection; Finance; Frequency; Helium; Hurricanes; Linear discriminant analysis; Mathematical model; Nominations and elections; DPM; TFIDF.; Topic detection; bursty feature; online; probabilistic model; time-aware;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2009.203
  • Filename
    5374412