مرکز منطقه ای اطلاع رساني علوم و فناوري - An integrated probabilistic text clustering model with segment-based and word order evidence

DocumentCode :

564064

Title :

An integrated probabilistic text clustering model with segment-based and word order evidence

Author :

Dai, Lin

Author_Institution :

Sch. of Comput. Sci., Beijing Inst. of Technol., Beijing, China

fYear :

2011

fDate :

Nov. 29 2011-Dec. 1 2011

Firstpage :

Lastpage :

Abstract :

Text clustering is an important research topic with many practical applications. Traditional clustering algorithms such as K-means and Probabilistic Latent Semantic Indexing (pLSI) simply treat each document as a single chunk of text and also ignore important word order information, which limits their performance. This paper proposes an integrated probabilistic model to explicitly combine the evidence from individual segments within a document and the word order information. Based on this model, a text clustering framework is proposed. Experiments on test datasets indicate substantial performance gains over state-of-the-art algorithms.

Keywords :

indexing; pattern clustering; probability; text analysis; integrated probabilistic model; integrated probabilistic text clustering model; k-means indexing; probabilistic latent semantic indexing; segment-based evidence; word order evidence; Adaptation models; Clustering algorithms; Entropy; Indexing; Noise; Probabilistic logic; Semantics;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Advanced Information Management and Service (ICIPM), 2011 7th International Conference on

Conference_Location :

Jeju

Print_ISBN :

978-1-4577-0471-0

Type :

conf

Filename :

6222140

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=564064