DocumentCode :
2335625
Title :
Combining labeled and unlabeled data for text classification with a large number of categories
Author :
Ghani, Rayid
Author_Institution :
Center for Automated Learning & Discovery, Carnegie Mellon Univ., USA
fYear :
2001
fDate :
2001
Firstpage :
597
Lastpage :
598
Abstract :
We develop a framework to incorporate unlabeled data in the error-correcting output coding (ECOC) setup by decomposing multiclass problems into multiple binary problems and then use co-training to learn the individual binary classification problems. We show that our method is especially useful for classification tasks involving a large number of categories where co-training doesn´t perform very well by itself and when combined with ECOC, outperforms several other algorithms that combine labeled and unlabeled data for text classification in terms of accuracy, precision-recall tradeoff, and efficiency
Keywords :
error correction codes; learning (artificial intelligence); pattern classification; text analysis; accuracy; binary classification problems; categories; co-training; error correcting output coding setup; labeled data; multiclass problems; multiple binary problems; precision-recall tradeoff; text classification; unlabeled data; Classification algorithms; Labeling; Supervised learning; Testing; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
Type :
conf
DOI :
10.1109/ICDM.2001.989574
Filename :
989574
Link To Document :
بازگشت