DocumentCode :
2954621
Title :
Text Categorization for Multi-label Documents and Many Categories
Author :
Popa, I. Sandu ; Zeitouni, K. ; Gardarin, G. ; Nakache, D. ; Metais, E.
Author_Institution :
PRiSM Lab., Versailles
fYear :
2007
fDate :
20-22 June 2007
Firstpage :
421
Lastpage :
426
Abstract :
In this paper, we propose a new classification method that addresses classification in multiple categories of textual documents. We call it Matrix Regression (MR) due to its resemblance to regression in a high dimensional space. Experiences on a medical corpus of hospital records to be classified by ICD (International Classification of Diseases) code demonstrate the validity of the MR approach. We compared MR with three frequently used algorithms in text categorization that are k-Nearest Neighbors, Centroide and Support Vector Machine. The experimental results show that our method outperforms them in both precision and time of classification.
Keywords :
biology computing; medical administrative data processing; hospital records; k-nearest neighbor method; matrix regression; medical corpus; multilabel documents; support vector machine; text categorization; Hospitals; Laboratories; Learning systems; Machine learning; Supervised learning; Support vector machine classification; Support vector machines; Testing; Text categorization; Unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer-Based Medical Systems, 2007. CBMS '07. Twentieth IEEE International Symposium on
Conference_Location :
Maribor
ISSN :
1063-7125
Print_ISBN :
0-7695-2905-4
Type :
conf
DOI :
10.1109/CBMS.2007.108
Filename :
4262685
Link To Document :
بازگشت