Title :
A comparative study on unsupervised feature selection methods for text clustering
Author :
Liu, Luying ; Kang, Jianchu ; Yu, Jing ; Wang, Zhongliang
Author_Institution :
Sch. of Comput. Sci., Beijing Univ., China
fDate :
30 Oct.-1 Nov. 2005
Abstract :
Text clustering is one of the central problems in text mining and information retrieval area. For the high dimensionality of feature space and the inherent data sparsity, performance of clustering algorithms will dramatically decline. Two techniques are used to deal with this problem: feature extraction and feature selection. Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, four unsupervised feature selection methods, DF, TC, TVQ, and a new proposed method TV are introduced. Experiments are taken to show that feature selection methods can improves efficiency as well as accuracy of text clustering. Three clustering validity criterions are studied and used to evaluate clustering results.
Keywords :
data mining; feature extraction; pattern clustering; text analysis; unsupervised learning; data sparsity; feature extraction; feature space dimensionality; information retrieval; text clustering; text mining; unsupervised feature selection; Clustering algorithms; Computer science; Feature extraction; Frequency; Information retrieval; Navigation; Principal component analysis; TV; Text categorization; Text mining;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
DOI :
10.1109/NLPKE.2005.1598807