DocumentCode
1881729
Title
Feature matricization for document classification
Author
Sanguansat, Parinya
Author_Institution
Fac. of Eng. & Technol., Panyapiwat Inst. of Manage., Nonthaburi, Thailand
fYear
2012
fDate
12-15 Aug. 2012
Firstpage
745
Lastpage
749
Abstract
Generally, the dimension of feature vector in text classification depends on the number of words in the specific domain. Many documents of considered categories make it numerous. Therefore, the dimension of feature vector is very high that makes it consumes a lot of time and memory to process. Moreover, it is a cause of the small sample size problem when the number of available training documents is far smaller than the dimension of these feature vectors. This paper proposes the alternative technique of dimensionality reduction for the feature vector in two-dimensional manner by previously transforming the feature vector to the feature matrix and then using Two-Dimensional Principal Component Analysis (2DPCA) for reducing the dimension of this feature matrix. Based on 2DPCA, the original weighted term matrix is not necessary to store in the memory anymore because the scatter matrix of 2DPCA can be computed incrementally. The small reduction in matrix form impacts to the plenty of dimensionality reduction in vector form. From the experimental results on well-known dataset, the proposed method not only significantly reduce the dimensionality but also achieve the higher accuracy rate than the original feature space.
Keywords
classification; feature extraction; matrix algebra; principal component analysis; text analysis; dimensionality reduction; document classification; feature matrix; feature space; feature vector; text classification; two dimensional principal component analysis; Accuracy; Covariance matrix; Feature extraction; Machine learning; Principal component analysis; Support vector machines; Vectors; Document classification; Feature extraction; Matricization;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing, Communication and Computing (ICSPCC), 2012 IEEE International Conference on
Conference_Location
Hong Kong
Print_ISBN
978-1-4673-2192-1
Type
conf
DOI
10.1109/ICSPCC.2012.6335622
Filename
6335622
Link To Document