Title :
An Efficient Method of Genetic Algorithm for Text Clustering Based on Singular Value Decomposition
Author :
Song, Wei ; Park, Soon Cheol
Author_Institution :
Chonbuk Nat. Univ., Jeonju
Abstract :
In this paper, we propose a method of genetic algorithm (GA) for text clustering based on singular value decomposition technique. The main difficulty in the application of GA to text clustering is its long string representation in high dimensional space. Because the most straightforward and popular approach represents texts with vector space model (VSM), that is, each unique term in the vocabulary represents one dimension. Singular value decomposition (SVD) is a successful technique arising from numerical linear algebra that is used in latent semantic indexing (LSI). Employing the SVD-based document representation, LSI can overcome the problems by using statistically derived conceptual indices instead of individual words and provide a dimension reduced space. Genetic algorithm belongs to search techniques which could automatically exploit the optimal solution for objective or fitness function of an optimization problem. GA can be used in conjunction with the reduced latent semantic structure and improve clustering efficiency and accuracy. Our algorithm is performed on Reuter documents collection. The results show that the performance of SVD-based GA is significantly superior to that of conventional GA in vector space model.
Keywords :
genetic algorithms; pattern clustering; singular value decomposition; text analysis; Reuter document collection; genetic algorithm; latent semantic indexing; numerical linear algebra; optimization problem; singular value decomposition; statistical analysis; string representation; text clustering; vector space model; Clustering algorithms; Computational efficiency; Genetic algorithms; Genetic engineering; Indexing; Information technology; Large scale integration; Singular value decomposition; Vectors; Vocabulary;
Conference_Titel :
Computer and Information Technology, 2007. CIT 2007. 7th IEEE International Conference on
Conference_Location :
Aizu-Wakamatsu, Fukushima
Print_ISBN :
978-0-7695-2983-7
DOI :
10.1109/CIT.2007.197