Title :
The clustering algorithm for Chinese texts based on Lingo
Author :
Xiuqin Lin ; Qianhao Zhang ; Gengyu Wei
Author_Institution :
Beijing Key Lab. of Intell. Telecommun. Software & Multimedia, Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
This paper presents a novel Chinese text clustering algorithm, named C-Lingo (Chinese Lingo), which improves the performance of Lingo Algorithm to replace the singular value decomposition (SVD) by non-negative matrix factorization (NMF). In the C-Lingo algorithm functions of the segmentation of Chinese word and the removal of stop words are added in order to process the web page with Chinese text. The evaluation results show that the C-Lingo algorithm has got better performances.
Keywords :
computational linguistics; natural language processing; pattern clustering; singular value decomposition; text analysis; C-Lingo; Chinese Lingo; Chinese text clustering algorithm; Chinese word segmentation; nonnegative matrix factorization; singular value decomposition; stop words; Algorithm design and analysis; Clustering algorithms; Contamination; Matrix decomposition; Singular value decomposition; Software algorithms; Text processing; C-Lingo; Chinese text clustering; Lingo; NMF; SVD;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
Print_ISBN :
978-1-61284-180-9
DOI :
10.1109/FSKD.2011.6019740