DocumentCode :
2755600
Title :
Application of Genetic Algorithm in Document Clustering
Author :
Jian-Xiang, Wei ; Huai, Liu ; Yue-hong, Sun ; Xin-Ning, Su
Author_Institution :
Dept. of Inf. Sci., Nanjing Coll. for Population Programme Manage., Nanjing, China
Volume :
1
fYear :
2009
fDate :
25-26 July 2009
Firstpage :
145
Lastpage :
148
Abstract :
By researching all kinds of methods for document clustering, we put forward a new dynamic method based on genetic algorithm (GA). K-means is a greedy algorithm, which is sensitive to the choice of cluster center and very easily results in local optimization. Genetic algorithm is a global convergence algorithm, which can find the best cluster centers easily. Among the traditional document clustering methods, the document similar matrix is a sparse matrix. In this paper, we propose some new formulas improved on the traditional method. Then, we make some improvement on genetic algorithm. All individuals are encoded by floating-point number and the sum of mean square deviation of intra-class distance is adopted as the objective function. The steps of the algorithm are given in detail. The experimental results show that the accuracy of GA can reach over 98 percent and generate better clustering result than K-means.
Keywords :
document handling; genetic algorithms; greedy algorithms; number theory; pattern clustering; sparse matrices; document clustering; document similar matrix; encoding; floating-point number; genetic algorithm; global convergence algorithm; greedy algorithm; intra-class distance; local optimization; mean square deviation; objective function; sparse matrix; Application software; Clustering algorithms; Clustering methods; Computer science; Genetic algorithms; Greedy algorithms; Information science; Information technology; Sparse matrices; Sun; Document Clustering; Genetic Algorithm; Optimal Cluster Center;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology and Computer Science, 2009. ITCS 2009. International Conference on
Conference_Location :
Kiev
Print_ISBN :
978-0-7695-3688-0
Type :
conf
DOI :
10.1109/ITCS.2009.269
Filename :
5190037
Link To Document :
بازگشت