DocumentCode
534198
Title
Clustering Efficient Method on Mass Chinese Text Based on Semantic Concept
Author
Jinling, Liu ; Hong, Zhou
Author_Institution
Comput. Eng. Fac., Huaiyin Inst. of Technol., Huaian, China
Volume
2
fYear
2010
fDate
16-18 July 2010
Firstpage
151
Lastpage
155
Abstract
In the current thinking of the Chinese text clustering, most clustering algorithms are limited by the data´s scalability and the results´ interpretability. This paper presents an efficient Chinese text clustering method based on semantic concepts. This method, proceeding from the text itself, by using classified hierarchy Subject Word in Thesaurus of Modern Chinese, extracts the conceptional tuple from a high-dimensional text vector collection to form the high-level concept expressing clustering results. Then samples are divided based on these high-level concepts which indicates the entire text clustering process has completed. On the premise of ensuring the clustering results´ accuracy, this method can greatly reduce the number of data needing to be processed and improve the clustering algorithms´ scalability. The experimental results show that this clustering algorithm has achieved a satisfactory clustering result and a higher implementation efficiency as well.
Keywords
natural language processing; pattern clustering; text analysis; Chinese text clustering; classified hierarchy subject word; clustering efficient method; high-dimensional text vector collection; mass Chinese text; modern Chinese thesaurus; semantic concept; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Clustering methods; Dictionaries; Semantics; Thesauri; chinese text; classified dictionary; clustering; conceptional tuple; semantic;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology and Applications (IFITA), 2010 International Forum on
Conference_Location
Kunming
Print_ISBN
978-1-4244-7621-3
Electronic_ISBN
978-1-4244-7622-0
Type
conf
DOI
10.1109/IFITA.2010.77
Filename
5634880
Link To Document