DocumentCode :
3188334
Title :
Document Clustering based on mutual information and PCA subspace
Author :
Yang, Jiangfeng ; Ma, Zheng
Author_Institution :
Dept. of Commun. & Inf. Eng., Univ. of Electron. Sci. & Technol. of China, Chengdu, China
fYear :
2011
fDate :
8-10 Aug. 2011
Firstpage :
2983
Lastpage :
2986
Abstract :
Mutual information is a criterion widely used in statistical language modeling word associations and feature selection. Principle component analysis (PCA) is a statistical technique for unsupervised dimension reduction. K-means clustering is commonly used data clustering for unsupervised learning tasks. In this paper, we first select features from native feature space by mutual information to reduce data dimension, then decompose the covariance matrix of new word-document matrix via Singular Value Decomposition (SVD), finally, K-means is used to cluster in PCA subspace. The clustering result is analyzed comprehensively under different condition.
Keywords :
document handling; pattern clustering; principal component analysis; K-means clustering; PCA subspace; covariance matrix; data clustering; data dimension; document clustering; feature selection; mutual information; principle component analysis; singular value decomposition; statistical language modeling word associations; unsupervised dimension reduction; unsupervised learning task; word-document matrix; Accuracy; Clustering algorithms; Covariance matrix; Eigenvalues and eigenfunctions; Mutual information; Principal component analysis; Training; PCA subspace; dimension reduction; document clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), 2011 2nd International Conference on
Conference_Location :
Deng Leng
Print_ISBN :
978-1-4577-0535-9
Type :
conf
DOI :
10.1109/AIMSEC.2011.6011352
Filename :
6011352
Link To Document :
بازگشت