• DocumentCode
    2865330
  • Title

    Efficient text classification by weighted proximal SVM

  • Author

    Zhuang, Dong ; Zhang, Benyu ; Yang, Qiang ; Yan, Jun ; Chen, Zheng ; Chen, Ying

  • Author_Institution
    Comput. Sci. & Eng., Beijing Inst. of Technol., China
  • fYear
    2005
  • fDate
    27-30 Nov. 2005
  • Abstract
    In this paper, we present an algorithm that can classify large-scale text data with high classification quality and fast training speed. Our method is based on a novel extension of the proximal SVM mode (Fung and Mangasarian, 2001). Previous studies on proximal SVM have focused on classification for low dimensional data and did not consider the unbalanced data cases. Such methods will meet difficulties when classifying unbalanced and high dimensional data sets such as text documents. In this work, we extend the original proximal SVM by learning a weight for each training error. We show that the classification algorithm based on this model is capable of handling high dimensional and unbalanced data. In the experiments, we compare our method with the original proximal SVM (as a special case of our algorithm) and the standard SVM (such as SVM light) on the recently published RCV1-v2 dataset. The results show that our proposed method had comparable classification quality with the standard SVM. At the same time, both the time and memory consumption of our method are less than that of the standard SVM.
  • Keywords
    support vector machines; text analysis; high dimensional data; large-scale text data; text classification; text documents; unbalanced data; weight learning; weighted proximal support vector maching; Asia; Classification algorithms; Computer science; Information science; Large-scale systems; Standards publication; Statistical learning; Support vector machine classification; Support vector machines; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, Fifth IEEE International Conference on
  • ISSN
    1550-4786
  • Print_ISBN
    0-7695-2278-5
  • Type

    conf

  • DOI
    10.1109/ICDM.2005.56
  • Filename
    1565722