DocumentCode :
383428
Title :
Discriminative features for document classification
Author :
Torkkola, Kari
Author_Institution :
Motorola Labs., Tempe, AZ, USA
Volume :
1
fYear :
2002
fDate :
2002
Firstpage :
472
Abstract :
Document representation using the bag-of-words approach may require bringing the dimensionality of the representation down in order to be able to make effective use of various statistical classification methods. Latent Semantic Indexing (LSI) is one such method that is based on eigendecomposition of the covariance of the document-term matrix. Another often used approach is to select a small number of most important features out of the whole set according to some relevant criterion. This paper points out that LSI ignores discrimination while concentrating on representation. Furthermore, selection methods fail to produce a feature set that jointly optimizes class discrimination. As a remedy, we suggest supervised linear discriminative transforms, and report good classification results applying these to the Reuters-21578 database.
Keywords :
document image processing; eigenvalues and eigenfunctions; image classification; image representation; Reuters-21578 database; bag-of-words approach; discriminative features; document classification; document representation; document-term matrix; eigendecomposition; latent semantic indexing; statistical classification methods; supervised linear discriminative transforms; Covariance matrix; Databases; Electronic mail; Indexing; Linear discriminant analysis; Optimization methods; Pattern recognition; Rivers; Support vector machine classification; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2002. Proceedings. 16th International Conference on
ISSN :
1051-4651
Print_ISBN :
0-7695-1695-X
Type :
conf
DOI :
10.1109/ICPR.2002.1044765
Filename :
1044765
Link To Document :
بازگشت