DocumentCode :
3282404
Title :
Fuzzy C-Means Text Clustering with Supervised Feature Selection
Author :
Wang, Wei ; Wang, Chunheng ; Cui, Xia ; Wang, Ai
Author_Institution :
Key Lab. of Complex Syst. & Intell. Sci., Chinese Acad. of Sci., Beijing
Volume :
1
fYear :
2008
fDate :
18-20 Oct. 2008
Firstpage :
57
Lastpage :
61
Abstract :
The traditional text clustering algorithm often uses the unsupervised feature selection method to select the feature. In this paper we propose a new text clustering algorithm SFFCM which use the supervised feature selection method to select the feature. The SFFCM is based on the EM algorithm. In the E-step, to calculate the expectation, we use the supervised feature selection algorithm to calculate the relevancy score for each term. In the M step we use the FCM algorithm to obtain the cluster results based on the selected terms. Our experimental results on standard document clustering benchmark corpuses: OHSUMED, 20-Newsgroups and Reuters-21578 show that the SFFCM text clustering algorithm can generate better clustering results than other control clustering methods and the supervised feature selection can improve the performance of the text clustering algorithm. We also propose a supervised feature selection measure CRF-CHI measure which is based on the chi2 statistic and the category relative frequency. The experimental results also confirm that the CRF-CHI is an effective supervised feature selection measure.
Keywords :
document image processing; expectation-maximisation algorithm; fuzzy systems; pattern clustering; text analysis; 20-Newsgroups; EM algorithm; OHSUMED; Reuters-21578; fuzzy c-means text clustering; standard document clustering benchmark; supervised feature selection; Automation; Clustering algorithms; Clustering methods; Frequency measurement; Fuzzy systems; Intelligent systems; Iterative methods; Laboratories; Statistics; Text categorization; Clustering; Feature Selection; Fuzzy C means;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Shandong
Print_ISBN :
978-0-7695-3305-6
Type :
conf
DOI :
10.1109/FSKD.2008.161
Filename :
4665939
Link To Document :
بازگشت