Title :
TCBLHT: a new method of hierarchical text clustering
Author :
Xu, Jian-Suo ; Wang, Li
Author_Institution :
Sch. of Economy & Manage., Henan Normal Univ., Xinxiang, China
Abstract :
This paper presents a new method of hierarchical text clustering based on combination of latent semantic analysis (LSA) and hierarchical TGSOM, which is called TCBLHT method. The text clustering result using traditional methods cannot show hierarchical structure, however, the hierarchical structure is very important in text clustering. The TCBLHT method can automatically achieve hierarchical text clustering, and establishes vector space model (VSM) of term weight by using the theory of LSA, then semantic relation is included in the vector space model. Both theory analysis and experimental results confirm that TCBLHT method decreases the number of vector, and enhances the efficiency and precision of text clustering.
Keywords :
computational linguistics; data mining; pattern clustering; text analysis; vectors; LSA; TCBLHT; hierarchical TGSOM; hierarchical text clustering; latent semantic analysis; vector space model; Clustering methods; Data mining; Frequency; Functional analysis; Machine learning; Matrix decomposition; Singular value decomposition; Statistics; Technology management; Text mining; Hierarchical TGSOM; Latent Semantic Analysis; Text Clustering; Vector Space Model;
Conference_Titel :
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location :
Guangzhou, China
Print_ISBN :
0-7803-9091-1
DOI :
10.1109/ICMLC.2005.1527306