Title :
An experimental investigation on PCA based on cosine similarity and correlation for text feature dimensionality reduction
Author :
Maysa I Abdulhussain;John Q Gan
Author_Institution :
School of Computer Science and Electronic Engineering, University of Essex Colchester, Essex CO4 3SQ, UK
Abstract :
Principal component analysis (PCA) is a commonly used method for feature extraction and dimensionality reduction. This paper proposes PCA based on similarity/correlation criteria instead of covariance to gain low-dimensional features with high performance in text classification. Experimental results have demonstrated the advantages and usefulness of the proposed method in text classification in high-dimensional feature space, in terms of the number of features required to achieve the best classification accuracy.
Keywords :
"Principal component analysis","Covariance matrices","Correlation","Accuracy","Electronic mail","Computer science","Support vector machines"
Conference_Titel :
Computer Science and Electronic Engineering Conference (CEEC), 2015 7th
DOI :
10.1109/CEEC.2015.7332689