DocumentCode
3188334
Title
Document Clustering based on mutual information and PCA subspace
Author
Yang, Jiangfeng ; Ma, Zheng
Author_Institution
Dept. of Commun. & Inf. Eng., Univ. of Electron. Sci. & Technol. of China, Chengdu, China
fYear
2011
fDate
8-10 Aug. 2011
Firstpage
2983
Lastpage
2986
Abstract
Mutual information is a criterion widely used in statistical language modeling word associations and feature selection. Principle component analysis (PCA) is a statistical technique for unsupervised dimension reduction. K-means clustering is commonly used data clustering for unsupervised learning tasks. In this paper, we first select features from native feature space by mutual information to reduce data dimension, then decompose the covariance matrix of new word-document matrix via Singular Value Decomposition (SVD), finally, K-means is used to cluster in PCA subspace. The clustering result is analyzed comprehensively under different condition.
Keywords
document handling; pattern clustering; principal component analysis; K-means clustering; PCA subspace; covariance matrix; data clustering; data dimension; document clustering; feature selection; mutual information; principle component analysis; singular value decomposition; statistical language modeling word associations; unsupervised dimension reduction; unsupervised learning task; word-document matrix; Accuracy; Clustering algorithms; Covariance matrix; Eigenvalues and eigenfunctions; Mutual information; Principal component analysis; Training; PCA subspace; dimension reduction; document clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), 2011 2nd International Conference on
Conference_Location
Deng Leng
Print_ISBN
978-1-4577-0535-9
Type
conf
DOI
10.1109/AIMSEC.2011.6011352
Filename
6011352
Link To Document