• DocumentCode
    3188334
  • Title

    Document Clustering based on mutual information and PCA subspace

  • Author

    Yang, Jiangfeng ; Ma, Zheng

  • Author_Institution
    Dept. of Commun. & Inf. Eng., Univ. of Electron. Sci. & Technol. of China, Chengdu, China
  • fYear
    2011
  • fDate
    8-10 Aug. 2011
  • Firstpage
    2983
  • Lastpage
    2986
  • Abstract
    Mutual information is a criterion widely used in statistical language modeling word associations and feature selection. Principle component analysis (PCA) is a statistical technique for unsupervised dimension reduction. K-means clustering is commonly used data clustering for unsupervised learning tasks. In this paper, we first select features from native feature space by mutual information to reduce data dimension, then decompose the covariance matrix of new word-document matrix via Singular Value Decomposition (SVD), finally, K-means is used to cluster in PCA subspace. The clustering result is analyzed comprehensively under different condition.
  • Keywords
    document handling; pattern clustering; principal component analysis; K-means clustering; PCA subspace; covariance matrix; data clustering; data dimension; document clustering; feature selection; mutual information; principle component analysis; singular value decomposition; statistical language modeling word associations; unsupervised dimension reduction; unsupervised learning task; word-document matrix; Accuracy; Clustering algorithms; Covariance matrix; Eigenvalues and eigenfunctions; Mutual information; Principal component analysis; Training; PCA subspace; dimension reduction; document clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), 2011 2nd International Conference on
  • Conference_Location
    Deng Leng
  • Print_ISBN
    978-1-4577-0535-9
  • Type

    conf

  • DOI
    10.1109/AIMSEC.2011.6011352
  • Filename
    6011352