DocumentCode
2507003
Title
Dimension reduction using least squares regression in multi-labeled text categorization
Author
Park, Cheong Hee
Author_Institution
Dept. of Comput. Sci. & Eng., Chungnam Nat. Univ., Daejeon
fYear
2008
fDate
8-11 July 2008
Firstpage
71
Lastpage
76
Abstract
Dimension reduction is a preprocessing step by which small number of optimal features are extracted. Among several statistical dimension reduction methods, Linear discriminant analysis (LDA) performs dimension reduction to maximize class separability in the reduced dimensional space. However, in multi-labeled problems, data samples belonging to multiple classes cause contradiction between the maximization of the distances between classes and the minimization of the scatter within classes, since they are placed in the overlapping area of multiple classes. In this paper, we show that in multi-labeled text categorization, the outputs from multiple linear methods can be used to compose new features for low dimensional representation. Especially, we apply least squares regression and a linear support vector machine (SVM) for multiple binary-class problems constructed from a multi-labeled problem and obtain optimal features in a low dimensional space which are fed into another classification algorithm. Extensive experimental results in text categorization are presented comparing with other dimension reduction methods and multi-label classification algorithms.
Keywords
classification; least squares approximations; regression analysis; support vector machines; text analysis; data samples; dimensional space reduction; feature extraction; least squares regression; linear discriminant analysis; linear support vector machine; low dimensional space; multilabel classification algorithms; multilabeled text categorization; multiple binary-class problems; multiple linear methods; statistical dimension reduction; Classification algorithms; Data mining; Indexing; Large scale integration; Least squares methods; Linear discriminant analysis; Scattering; Support vector machine classification; Support vector machines; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer and Information Technology, 2008. CIT 2008. 8th IEEE International Conference on
Conference_Location
Sydney, NSW
Print_ISBN
978-1-4244-2357-6
Electronic_ISBN
978-1-4244-2358-3
Type
conf
DOI
10.1109/CIT.2008.4594652
Filename
4594652
Link To Document