DocumentCode :
564064
Title :
An integrated probabilistic text clustering model with segment-based and word order evidence
Author :
Dai, Lin
Author_Institution :
Sch. of Comput. Sci., Beijing Inst. of Technol., Beijing, China
fYear :
2011
fDate :
Nov. 29 2011-Dec. 1 2011
Firstpage :
64
Lastpage :
70
Abstract :
Text clustering is an important research topic with many practical applications. Traditional clustering algorithms such as K-means and Probabilistic Latent Semantic Indexing (pLSI) simply treat each document as a single chunk of text and also ignore important word order information, which limits their performance. This paper proposes an integrated probabilistic model to explicitly combine the evidence from individual segments within a document and the word order information. Based on this model, a text clustering framework is proposed. Experiments on test datasets indicate substantial performance gains over state-of-the-art algorithms.
Keywords :
indexing; pattern clustering; probability; text analysis; integrated probabilistic model; integrated probabilistic text clustering model; k-means indexing; probabilistic latent semantic indexing; segment-based evidence; word order evidence; Adaptation models; Clustering algorithms; Entropy; Indexing; Noise; Probabilistic logic; Semantics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Information Management and Service (ICIPM), 2011 7th International Conference on
Conference_Location :
Jeju
Print_ISBN :
978-1-4577-0471-0
Type :
conf
Filename :
6222140
Link To Document :
بازگشت