Self-Training with Selection-by-Rejection

Author

Yan Zhou ; Kantarcioglu, Murat ; Thuraisingham, Bhavani

Author_Institution

Dept. of Comput. Sci., Univ. of Texas at Dallas, Richardson, TX, USA

fYear

2012

fDate

10-13 Dec. 2012

Firstpage

795

Lastpage

803

Abstract

Practical machine learning and data mining problems often face shortage of labeled training data. Self-training algorithms are among the earliest attempts of using unlabeled data to enhance learning. Traditional self-training algorithms label unlabeled data on which classifiers trained on limited training data have the highest confidence. In this paper, a self-training algorithm that decreases the disagreement region of hypotheses is presented. The algorithm supplements the training set with self-labeled instances. Only instances that greatly reduce the disagreement region of hypotheses are labeled and added to the training set. Empirical results demonstrate that the proposed self-training algorithm can effectively improve classification performance.

Keywords

data mining; learning (artificial intelligence); data mining problems; disagreement region; labeled training data; limited training data; machine learning; selection by rejection; self labeled instances; self training algorithm; unlabeled data; Accuracy; Algorithm design and analysis; Distributed databases; Labeling; Noise; Semisupervised learning; Training; self-training; semi-supervised learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Mining (ICDM), 2012 IEEE 12th International Conference on

Conference_Location

Brussels

ISSN

1550-4786

Print_ISBN

978-1-4673-4649-8

Type

conf

DOI

10.1109/ICDM.2012.56

Filename

6413850