• DocumentCode
    3724164
  • Title

    GS-Orthogonalization Based "Basis Feature" Selection from Word Co-occurrence Matrix

  • Author

    Deqing Wang;Hui Zhang;Rui Liu

  • Author_Institution
    Sch. of Comput. Sci., Beihang Univ., Beijing, China
  • fYear
    2015
  • Firstpage
    1027
  • Lastpage
    1032
  • Abstract
    Feature selection plays an important role in machinelearning applications. Especially for text data, the highdimensionaland sparse characteristics will affect the performanceof feature selction. In this paper, an unsupervised feature selection algorithm through Random Projection and Gram-Schmidt Orthogonalization (RP-GSO) from the word co-occurrence matrix is proposed. The RP-GSO has three advantages: (1) it takes as input dense word co-occurrence matrix, avoiding the sparseness of original document-term matrix, (2) it selects "basis features" by Gram-Schmidt process, guaranteeing the orthogonalization of feature space, and (3) it adopts random projection to speed upGS process. We did extensive experiments on two real-world textcorpora, and observed that RP-GSO achieves better performancecomparing against supervised and unsupervised methods in textclassification and clustering tasks.
  • Keywords
    "Sparse matrices","Feature extraction","Training","Clustering algorithms","MATLAB","Computer science","Matrix decomposition"
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2015 IEEE International Conference on
  • ISSN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2015.80
  • Filename
    7373430