Title :
A New Algorithm for Text Clustering Based on Projection Pursuit
Author :
Gao, Mao-Ting ; Wang, Zheng-Ou
Author_Institution :
Shanghai Maritime Univ., Shanghai
Abstract :
Vector Space Model ( VSM ) is usually used to express text features in text mining with huge dimension, but it can not show the structure of the text set obviously and costs much in computing. A new pursuit projection based text clustering algorithm is proposed. With minimizing (or maximizing) a projecting index, Projection Pursuit searches for an optimal projection direction and projects text feature vectors from high-dimensional into low-dimensional ( 1 to 3 dimensions ) space. The linear and non-linear structures and features of the original high-dimensional data can be expressed by its projection weights in the optimal projection direction. The optimal projection direction is looked for by genetic algorithm, and the distribution of texts can be visualized. Pursuit projection based text clustering does not need to set cluster number previously like in k-means clustering, and opens out non-linear structure not like in latent semantics analysis only discovering linear structure. Experiments demonstrated that this algorithm is effective to cluster texts.
Keywords :
genetic algorithms; pattern clustering; text analysis; vectors; dimension reduction; genetic algorithm; optimal projection direction; projecting index; projection pursuit; text clustering; text feature vectors; Clustering algorithms; Cybernetics; Data mining; Data visualization; Feature extraction; Genetic algorithms; Machine learning; Machine learning algorithms; Pursuit algorithms; Text mining; Dimension reduction; Genetic algorithm; Projection pursuit; Text clustering;
Conference_Titel :
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-0973-0
Electronic_ISBN :
978-1-4244-0973-0
DOI :
10.1109/ICMLC.2007.4370736