Title :
Self-Training with Selection-by-Rejection
Author :
Yan Zhou ; Kantarcioglu, Murat ; Thuraisingham, Bhavani
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Dallas, Richardson, TX, USA
Abstract :
Practical machine learning and data mining problems often face shortage of labeled training data. Self-training algorithms are among the earliest attempts of using unlabeled data to enhance learning. Traditional self-training algorithms label unlabeled data on which classifiers trained on limited training data have the highest confidence. In this paper, a self-training algorithm that decreases the disagreement region of hypotheses is presented. The algorithm supplements the training set with self-labeled instances. Only instances that greatly reduce the disagreement region of hypotheses are labeled and added to the training set. Empirical results demonstrate that the proposed self-training algorithm can effectively improve classification performance.
Keywords :
data mining; learning (artificial intelligence); data mining problems; disagreement region; labeled training data; limited training data; machine learning; selection by rejection; self labeled instances; self training algorithm; unlabeled data; Accuracy; Algorithm design and analysis; Distributed databases; Labeling; Noise; Semisupervised learning; Training; self-training; semi-supervised learning;
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-4649-8
DOI :
10.1109/ICDM.2012.56