DocumentCode :
2543810
Title :
Enhancing GSOM text clustering with Latent Semantic Analysis
Author :
Matharage, Sumith ; Alahakoon, Damminda
Author_Institution :
Clayton Sch. of Inf. Technol., Monash Univ., Clayton, VIC, Australia
fYear :
2010
fDate :
17-19 Dec. 2010
Firstpage :
441
Lastpage :
446
Abstract :
Growing Self Organizing Map (GSOM) has proven benefits in text clustering. Latent Semantic Analysis (LSA) also has been used in text clustering to capture the latent concepts from text. This paper presents a novel combination of GSOM and LSA to improve text clustering results compared to using GSOM on its own. LSA is an inherently global algorithm that looks at trends and patterns globally and GSOM is a nearest neighborhood based algorithm which looks at local patterns. Combination of these two can be used to discover both the global and local patterns. In the proposed model, initial text corpus is converted into its vector space representation using the traditional Term Frequency - Inverse Document Frequency (TF-IDF) technique. Then the Singular Value Decomposition (SVD) followed by Frobenius norm is applied on the resulting high dimensional vector to come up with a new vector with an optimal number of dimensions. Experiments using the proposed model were conducted and compared with the original GSOM under the same conditions. Experiment results demonstrate that the new combination of these well known techniques enhances the accuracy of clustering results and the computational time than the GSOM alone.
Keywords :
data mining; pattern clustering; self-organising feature maps; singular value decomposition; text analysis; Frobenius norm; GSOM text clustering; growing self organizing map; latent semantic analysis; singular value decomposition; term frequency inverse document frequency; text corpus; text mining; vector space; Accuracy; Artificial neural networks; Clustering algorithms; Matrix decomposition; Neurons; Organizing; Semantics; Growing Self Organizing Map; Latent Semantic Analysis; text clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Automation for Sustainability (ICIAFs), 2010 5th International Conference on
Conference_Location :
Colombo
Print_ISBN :
978-1-4244-8549-9
Type :
conf
DOI :
10.1109/ICIAFS.2010.5715702
Filename :
5715702
Link To Document :
بازگشت