Title :
Research on the parallel text clustering algorithm based on the semantic tree
Author :
Liu, Gangfeng ; Wang, Yunlan ; Zhao, Tianhai ; Li, Dongyang
Author_Institution :
Center for High Performance Comput., Northwestern Polytech. Univ., Xi´´an, China
fDate :
Nov. 29 2011-Dec. 1 2011
Abstract :
Since the semantic relationship between words is neglected, the results of the text clustering algorithms that only use word frequency are not precision. In this paper, a semantic tree based text clustering algorithm which is based on WordNet is proposed. In order to reduce the time complexity, we adopt parallel algorithm in multi-processes model. This parallel algorithm starts some processes at the same time. The master process undertakes the task of data partitioning, sending information, collecting information and clustering the result. The slave processes basically are in charge of statistics of word frequency, calculating the weights and getting hypernyms of some words according to the semantic tree. The results of experiment show that this algorithm is not only higher in precision, but also with lower time complexity.
Keywords :
computational complexity; parallel algorithms; pattern clustering; statistics; text analysis; trees (mathematics); word processing; WordNet; data partitioning; information collection; information sending; multiprocesses model; parallel algorithm; parallel text clustering algorithm; semantic tree; time complexity reduction; word frequency statistics; word hypernyms; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Parallel algorithms; Partitioning algorithms; Semantics; Parallel Algorithm; Semantic Tree; Text Clustering; WordNet;
Conference_Titel :
Computer Sciences and Convergence Information Technology (ICCIT), 2011 6th International Conference on
Conference_Location :
Seogwipo
Print_ISBN :
978-1-4577-0472-7