مرکز منطقه ای اطلاع رساني علوم و فناوري - Learning to integrate unlabeled data in text classification

DocumentCode :

1947112

Title :

Learning to integrate unlabeled data in text classification

Author :

Jiang, Eric P.

Author_Institution :

Univ. of San Diego, San Diego, CA, USA

Volume :

fYear :

2010

fDate :

9-11 July 2010

Firstpage :

Lastpage :

Abstract :

The paper deals with the text classification problem where labeled training samples are very limited while unlabeled data are readily available in large quantities. The paper proposes an efficient classification algorithm that incorporates a weighted k-means clustering scheme into an Expectation Maximization (EM) process. It aims to balance predictive values between labeled and unlabeled training data and improve classification accuracy. Since the algorithm is based on a fast clustering method, it can be applied to classify documents in large datasets. Preliminary experiments with several text classification collections show that the proper use of unlabeled data built in this proposed text classification algorithm could significantly improve classification accuracy.

Keywords :

expectation-maximisation algorithm; pattern classification; pattern clustering; text analysis; classification algorithm; expectation maximization process; labeled training samples; text classification; weighted k-means clustering scheme; Accuracy; classification; clustering; feature selection;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on

Conference_Location :

Chengdu

Print_ISBN :

978-1-4244-5537-9

Type :

conf

DOI :

10.1109/ICCSIT.2010.5564473

Filename :

5564473

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1947112