DocumentCode
564064
Title
An integrated probabilistic text clustering model with segment-based and word order evidence
Author
Dai, Lin
Author_Institution
Sch. of Comput. Sci., Beijing Inst. of Technol., Beijing, China
fYear
2011
fDate
Nov. 29 2011-Dec. 1 2011
Firstpage
64
Lastpage
70
Abstract
Text clustering is an important research topic with many practical applications. Traditional clustering algorithms such as K-means and Probabilistic Latent Semantic Indexing (pLSI) simply treat each document as a single chunk of text and also ignore important word order information, which limits their performance. This paper proposes an integrated probabilistic model to explicitly combine the evidence from individual segments within a document and the word order information. Based on this model, a text clustering framework is proposed. Experiments on test datasets indicate substantial performance gains over state-of-the-art algorithms.
Keywords
indexing; pattern clustering; probability; text analysis; integrated probabilistic model; integrated probabilistic text clustering model; k-means indexing; probabilistic latent semantic indexing; segment-based evidence; word order evidence; Adaptation models; Clustering algorithms; Entropy; Indexing; Noise; Probabilistic logic; Semantics;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Information Management and Service (ICIPM), 2011 7th International Conference on
Conference_Location
Jeju
Print_ISBN
978-1-4577-0471-0
Type
conf
Filename
6222140
Link To Document