• DocumentCode
    564064
  • Title

    An integrated probabilistic text clustering model with segment-based and word order evidence

  • Author

    Dai, Lin

  • Author_Institution
    Sch. of Comput. Sci., Beijing Inst. of Technol., Beijing, China
  • fYear
    2011
  • fDate
    Nov. 29 2011-Dec. 1 2011
  • Firstpage
    64
  • Lastpage
    70
  • Abstract
    Text clustering is an important research topic with many practical applications. Traditional clustering algorithms such as K-means and Probabilistic Latent Semantic Indexing (pLSI) simply treat each document as a single chunk of text and also ignore important word order information, which limits their performance. This paper proposes an integrated probabilistic model to explicitly combine the evidence from individual segments within a document and the word order information. Based on this model, a text clustering framework is proposed. Experiments on test datasets indicate substantial performance gains over state-of-the-art algorithms.
  • Keywords
    indexing; pattern clustering; probability; text analysis; integrated probabilistic model; integrated probabilistic text clustering model; k-means indexing; probabilistic latent semantic indexing; segment-based evidence; word order evidence; Adaptation models; Clustering algorithms; Entropy; Indexing; Noise; Probabilistic logic; Semantics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Management and Service (ICIPM), 2011 7th International Conference on
  • Conference_Location
    Jeju
  • Print_ISBN
    978-1-4577-0471-0
  • Type

    conf

  • Filename
    6222140