DocumentCode :
2918301
Title :
Chinese Text Clustering Method Based on Semantics and Special Domain
Author :
Jianquan, Dong ; Jinchao, Zhang
Author_Institution :
Sch. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
fYear :
2009
fDate :
7-8 Nov. 2009
Firstpage :
195
Lastpage :
199
Abstract :
In view of ignoring semantic relationship between words, high dimensionality of data and computational complexity when current text clustering algorithms deal with Chinese texts. This paper presents a new method to cluster Chinese texts based on semantics in a specific field-TCBS (Text Clustering Based on Semantics) algorithm. The algorithm is based on the agglomerative hierarchical clustering algorithm, it expresses Chinese texts with the characteristic words and sets relative threshold in order to improve the efficiency of clustering. Compared with the traditional algorithms, the experimental results show that TCBS has effectively enhanced the quality of the clustering.
Keywords :
natural language processing; pattern clustering; text analysis; Chinese text clustering; agglomerative hierarchical clustering algorithm; computational complexity; data dimensionality; word semantic relationship; Algorithm design and analysis; Clustering algorithms; Clustering methods; Computational complexity; Data engineering; Data mining; Information systems; Machine learning; Sun; Text processing; TCBS; Text clustering; characteristic words; semantic; similarity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information Systems and Mining, 2009. WISM 2009. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3817-4
Type :
conf
DOI :
10.1109/WISM.2009.47
Filename :
5369484
Link To Document :
بازگشت