Title :
Text Feature Selection Based on Class Subspace
Author :
Xiaofei Zhou ; Li Guo ; Tianyi Wang ; Yue Hu
Author_Institution :
Inst. of Inf. Eng., Beijing, China
Abstract :
For text data, feature dimension reduction is a very significant and important work for simplifying document representation and enhancing computation of learning algorithm. There are usually two main dimension reduction strategies, feature extraction and feature selection. Feature extraction is to create new features to represent documents, whereas feature selection will return a subset of words as features. Comparing two strategies, feature extraction has powerful capacity in reducing dimensionality, but it will lost intuitive semantic for documents. Feature selection has perfect interpretability for text contents, and specially is significant for text dimension reduction, but it is still a difficult work to design a suitable measure for feature evaluation. In this paper we present a new feature selection method called class subspace feature selection (CSFS) method. We utilize PCA feature extraction method to capture lower dimensional class subspaces, and then base on the subspaces to choose the most relevant features to the subspaces. The feature words chosen by our method can approximate the class subspace which has lower dimensionality and also owns intuitive semantic understanding for the class. The experimental results on three text data sets show the effectiveness of our proposal feature selection method.
Keywords :
data reduction; learning (artificial intelligence); principal component analysis; text analysis; CSFS; PCA feature extraction method; class subspace; class subspace feature selection method; dimensionality reduction; document representation; feature dimension reduction; feature extraction; intuitive semantic understanding; learning algorithm; text content interpretability; text data; text feature selection; Electronic mail; Feature extraction; Principal component analysis; Semantics; Text categorization; Vectors; Feature selection; dimension reduction; text categorization;
Conference_Titel :
Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4275-6
DOI :
10.1109/ICDMW.2014.99