Title :
Feature matricization for document classification
Author :
Sanguansat, Parinya
Author_Institution :
Fac. of Eng. & Technol., Panyapiwat Inst. of Manage., Nonthaburi, Thailand
Abstract :
Generally, the dimension of feature vector in text classification depends on the number of words in the specific domain. Many documents of considered categories make it numerous. Therefore, the dimension of feature vector is very high that makes it consumes a lot of time and memory to process. Moreover, it is a cause of the small sample size problem when the number of available training documents is far smaller than the dimension of these feature vectors. This paper proposes the alternative technique of dimensionality reduction for the feature vector in two-dimensional manner by previously transforming the feature vector to the feature matrix and then using Two-Dimensional Principal Component Analysis (2DPCA) for reducing the dimension of this feature matrix. Based on 2DPCA, the original weighted term matrix is not necessary to store in the memory anymore because the scatter matrix of 2DPCA can be computed incrementally. The small reduction in matrix form impacts to the plenty of dimensionality reduction in vector form. From the experimental results on well-known dataset, the proposed method not only significantly reduce the dimensionality but also achieve the higher accuracy rate than the original feature space.
Keywords :
classification; feature extraction; matrix algebra; principal component analysis; text analysis; dimensionality reduction; document classification; feature matrix; feature space; feature vector; text classification; two dimensional principal component analysis; Accuracy; Covariance matrix; Feature extraction; Machine learning; Principal component analysis; Support vector machines; Vectors; Document classification; Feature extraction; Matricization;
Conference_Titel :
Signal Processing, Communication and Computing (ICSPCC), 2012 IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4673-2192-1
DOI :
10.1109/ICSPCC.2012.6335622