• DocumentCode
    398199
  • Title

    Combinatorial PCA and SVM methods for feature selection in learning classifications (applications to text categorization)

  • Author

    Anghelescu, Andrei V. ; Muchnik, Ilya B.

  • Author_Institution
    Dept. of Comput. Sci., Rutgers Univ., USA
  • fYear
    2003
  • fDate
    30 Sept.-4 Oct. 2003
  • Firstpage
    491
  • Lastpage
    496
  • Abstract
    We describe a purely combinatorial approach of obtaining meaningful representations of text data. More precisely, we describe two different methods that materialize this approach: we call them combinatorial principal component analysis (cPCA) and combinatorial support vector machines (cSVM). These names emphasise mathematical analogies between the well known PCA and SVM, on one hand, and our respective methods. For evaluating the selected spaces of features, we used the environment set for TREC 2002 and used a very common classifier: 1-nearest neighbour (1-NN). We compared the results obtained on the feature sets calculated by the procedures we described (cPCA and cSVM) with the results obtained on the original feature space. We showed that by selecting a feature space on average 50 times smaller than the original space, the performance of the classifier does not decrease by more than 2%.
  • Keywords
    learning (artificial intelligence); principal component analysis; support vector machines; text analysis; 1-NN; 1-nearest neighbour; cPCA; cSVM; combinatorial principal component analysis; combinatorial support vector machine; feature selection; learning classification; mathematical analogy; original feature space; text categorization application; Application software; Classification algorithms; Computer science; Degradation; Nonlinear filters; Principal component analysis; Support vector machine classification; Support vector machines; Text categorization; USA Councils;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Integration of Knowledge Intensive Multi-Agent Systems, 2003. International Conference on
  • Print_ISBN
    0-7803-7958-6
  • Type

    conf

  • DOI
    10.1109/KIMAS.2003.1245090
  • Filename
    1245090