Title :
Clustering Efficient Method on Mass Chinese Text Based on Semantic Concept
Author :
Jinling, Liu ; Hong, Zhou
Author_Institution :
Comput. Eng. Fac., Huaiyin Inst. of Technol., Huaian, China
Abstract :
In the current thinking of the Chinese text clustering, most clustering algorithms are limited by the data´s scalability and the results´ interpretability. This paper presents an efficient Chinese text clustering method based on semantic concepts. This method, proceeding from the text itself, by using classified hierarchy Subject Word in Thesaurus of Modern Chinese, extracts the conceptional tuple from a high-dimensional text vector collection to form the high-level concept expressing clustering results. Then samples are divided based on these high-level concepts which indicates the entire text clustering process has completed. On the premise of ensuring the clustering results´ accuracy, this method can greatly reduce the number of data needing to be processed and improve the clustering algorithms´ scalability. The experimental results show that this clustering algorithm has achieved a satisfactory clustering result and a higher implementation efficiency as well.
Keywords :
natural language processing; pattern clustering; text analysis; Chinese text clustering; classified hierarchy subject word; clustering efficient method; high-dimensional text vector collection; mass Chinese text; modern Chinese thesaurus; semantic concept; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Clustering methods; Dictionaries; Semantics; Thesauri; chinese text; classified dictionary; clustering; conceptional tuple; semantic;
Conference_Titel :
Information Technology and Applications (IFITA), 2010 International Forum on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-7621-3
Electronic_ISBN :
978-1-4244-7622-0
DOI :
10.1109/IFITA.2010.77