DocumentCode :
1947112
Title :
Learning to integrate unlabeled data in text classification
Author :
Jiang, Eric P.
Author_Institution :
Univ. of San Diego, San Diego, CA, USA
Volume :
4
fYear :
2010
fDate :
9-11 July 2010
Firstpage :
82
Lastpage :
86
Abstract :
The paper deals with the text classification problem where labeled training samples are very limited while unlabeled data are readily available in large quantities. The paper proposes an efficient classification algorithm that incorporates a weighted k-means clustering scheme into an Expectation Maximization (EM) process. It aims to balance predictive values between labeled and unlabeled training data and improve classification accuracy. Since the algorithm is based on a fast clustering method, it can be applied to classify documents in large datasets. Preliminary experiments with several text classification collections show that the proper use of unlabeled data built in this proposed text classification algorithm could significantly improve classification accuracy.
Keywords :
expectation-maximisation algorithm; pattern classification; pattern clustering; text analysis; classification algorithm; expectation maximization process; labeled training samples; text classification; weighted k-means clustering scheme; Accuracy; classification; clustering; feature selection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-5537-9
Type :
conf
DOI :
10.1109/ICCSIT.2010.5564473
Filename :
5564473
Link To Document :
بازگشت