• DocumentCode
    1881729
  • Title

    Feature matricization for document classification

  • Author

    Sanguansat, Parinya

  • Author_Institution
    Fac. of Eng. & Technol., Panyapiwat Inst. of Manage., Nonthaburi, Thailand
  • fYear
    2012
  • fDate
    12-15 Aug. 2012
  • Firstpage
    745
  • Lastpage
    749
  • Abstract
    Generally, the dimension of feature vector in text classification depends on the number of words in the specific domain. Many documents of considered categories make it numerous. Therefore, the dimension of feature vector is very high that makes it consumes a lot of time and memory to process. Moreover, it is a cause of the small sample size problem when the number of available training documents is far smaller than the dimension of these feature vectors. This paper proposes the alternative technique of dimensionality reduction for the feature vector in two-dimensional manner by previously transforming the feature vector to the feature matrix and then using Two-Dimensional Principal Component Analysis (2DPCA) for reducing the dimension of this feature matrix. Based on 2DPCA, the original weighted term matrix is not necessary to store in the memory anymore because the scatter matrix of 2DPCA can be computed incrementally. The small reduction in matrix form impacts to the plenty of dimensionality reduction in vector form. From the experimental results on well-known dataset, the proposed method not only significantly reduce the dimensionality but also achieve the higher accuracy rate than the original feature space.
  • Keywords
    classification; feature extraction; matrix algebra; principal component analysis; text analysis; dimensionality reduction; document classification; feature matrix; feature space; feature vector; text classification; two dimensional principal component analysis; Accuracy; Covariance matrix; Feature extraction; Machine learning; Principal component analysis; Support vector machines; Vectors; Document classification; Feature extraction; Matricization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing, Communication and Computing (ICSPCC), 2012 IEEE International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    978-1-4673-2192-1
  • Type

    conf

  • DOI
    10.1109/ICSPCC.2012.6335622
  • Filename
    6335622