DocumentCode
3724164
Title
GS-Orthogonalization Based "Basis Feature" Selection from Word Co-occurrence Matrix
Author
Deqing Wang;Hui Zhang;Rui Liu
Author_Institution
Sch. of Comput. Sci., Beihang Univ., Beijing, China
fYear
2015
Firstpage
1027
Lastpage
1032
Abstract
Feature selection plays an important role in machinelearning applications. Especially for text data, the highdimensionaland sparse characteristics will affect the performanceof feature selction. In this paper, an unsupervised feature selection algorithm through Random Projection and Gram-Schmidt Orthogonalization (RP-GSO) from the word co-occurrence matrix is proposed. The RP-GSO has three advantages: (1) it takes as input dense word co-occurrence matrix, avoiding the sparseness of original document-term matrix, (2) it selects "basis features" by Gram-Schmidt process, guaranteeing the orthogonalization of feature space, and (3) it adopts random projection to speed upGS process. We did extensive experiments on two real-world textcorpora, and observed that RP-GSO achieves better performancecomparing against supervised and unsupervised methods in textclassification and clustering tasks.
Keywords
"Sparse matrices","Feature extraction","Training","Clustering algorithms","MATLAB","Computer science","Matrix decomposition"
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2015 IEEE International Conference on
ISSN
1550-4786
Type
conf
DOI
10.1109/ICDM.2015.80
Filename
7373430
Link To Document