• DocumentCode
    2507003
  • Title

    Dimension reduction using least squares regression in multi-labeled text categorization

  • Author

    Park, Cheong Hee

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Chungnam Nat. Univ., Daejeon
  • fYear
    2008
  • fDate
    8-11 July 2008
  • Firstpage
    71
  • Lastpage
    76
  • Abstract
    Dimension reduction is a preprocessing step by which small number of optimal features are extracted. Among several statistical dimension reduction methods, Linear discriminant analysis (LDA) performs dimension reduction to maximize class separability in the reduced dimensional space. However, in multi-labeled problems, data samples belonging to multiple classes cause contradiction between the maximization of the distances between classes and the minimization of the scatter within classes, since they are placed in the overlapping area of multiple classes. In this paper, we show that in multi-labeled text categorization, the outputs from multiple linear methods can be used to compose new features for low dimensional representation. Especially, we apply least squares regression and a linear support vector machine (SVM) for multiple binary-class problems constructed from a multi-labeled problem and obtain optimal features in a low dimensional space which are fed into another classification algorithm. Extensive experimental results in text categorization are presented comparing with other dimension reduction methods and multi-label classification algorithms.
  • Keywords
    classification; least squares approximations; regression analysis; support vector machines; text analysis; data samples; dimensional space reduction; feature extraction; least squares regression; linear discriminant analysis; linear support vector machine; low dimensional space; multilabel classification algorithms; multilabeled text categorization; multiple binary-class problems; multiple linear methods; statistical dimension reduction; Classification algorithms; Data mining; Indexing; Large scale integration; Least squares methods; Linear discriminant analysis; Scattering; Support vector machine classification; Support vector machines; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology, 2008. CIT 2008. 8th IEEE International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-1-4244-2357-6
  • Electronic_ISBN
    978-1-4244-2358-3
  • Type

    conf

  • DOI
    10.1109/CIT.2008.4594652
  • Filename
    4594652