DocumentCode :
3318187
Title :
A comparative study on unsupervised feature selection methods for text clustering
Author :
Liu, Luying ; Kang, Jianchu ; Yu, Jing ; Wang, Zhongliang
Author_Institution :
Sch. of Comput. Sci., Beijing Univ., China
fYear :
2005
fDate :
30 Oct.-1 Nov. 2005
Firstpage :
597
Lastpage :
601
Abstract :
Text clustering is one of the central problems in text mining and information retrieval area. For the high dimensionality of feature space and the inherent data sparsity, performance of clustering algorithms will dramatically decline. Two techniques are used to deal with this problem: feature extraction and feature selection. Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, four unsupervised feature selection methods, DF, TC, TVQ, and a new proposed method TV are introduced. Experiments are taken to show that feature selection methods can improves efficiency as well as accuracy of text clustering. Three clustering validity criterions are studied and used to evaluate clustering results.
Keywords :
data mining; feature extraction; pattern clustering; text analysis; unsupervised learning; data sparsity; feature extraction; feature space dimensionality; information retrieval; text clustering; text mining; unsupervised feature selection; Clustering algorithms; Computer science; Feature extraction; Frequency; Information retrieval; Navigation; Principal component analysis; TV; Text categorization; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
Type :
conf
DOI :
10.1109/NLPKE.2005.1598807
Filename :
1598807
Link To Document :
بازگشت