DocumentCode
2984376
Title
Self-Training with Selection-by-Rejection
Author
Yan Zhou ; Kantarcioglu, Murat ; Thuraisingham, Bhavani
Author_Institution
Dept. of Comput. Sci., Univ. of Texas at Dallas, Richardson, TX, USA
fYear
2012
fDate
10-13 Dec. 2012
Firstpage
795
Lastpage
803
Abstract
Practical machine learning and data mining problems often face shortage of labeled training data. Self-training algorithms are among the earliest attempts of using unlabeled data to enhance learning. Traditional self-training algorithms label unlabeled data on which classifiers trained on limited training data have the highest confidence. In this paper, a self-training algorithm that decreases the disagreement region of hypotheses is presented. The algorithm supplements the training set with self-labeled instances. Only instances that greatly reduce the disagreement region of hypotheses are labeled and added to the training set. Empirical results demonstrate that the proposed self-training algorithm can effectively improve classification performance.
Keywords
data mining; learning (artificial intelligence); data mining problems; disagreement region; labeled training data; limited training data; machine learning; selection by rejection; self labeled instances; self training algorithm; unlabeled data; Accuracy; Algorithm design and analysis; Distributed databases; Labeling; Noise; Semisupervised learning; Training; self-training; semi-supervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location
Brussels
ISSN
1550-4786
Print_ISBN
978-1-4673-4649-8
Type
conf
DOI
10.1109/ICDM.2012.56
Filename
6413850
Link To Document