• DocumentCode
    2984376
  • Title

    Self-Training with Selection-by-Rejection

  • Author

    Yan Zhou ; Kantarcioglu, Murat ; Thuraisingham, Bhavani

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Texas at Dallas, Richardson, TX, USA
  • fYear
    2012
  • fDate
    10-13 Dec. 2012
  • Firstpage
    795
  • Lastpage
    803
  • Abstract
    Practical machine learning and data mining problems often face shortage of labeled training data. Self-training algorithms are among the earliest attempts of using unlabeled data to enhance learning. Traditional self-training algorithms label unlabeled data on which classifiers trained on limited training data have the highest confidence. In this paper, a self-training algorithm that decreases the disagreement region of hypotheses is presented. The algorithm supplements the training set with self-labeled instances. Only instances that greatly reduce the disagreement region of hypotheses are labeled and added to the training set. Empirical results demonstrate that the proposed self-training algorithm can effectively improve classification performance.
  • Keywords
    data mining; learning (artificial intelligence); data mining problems; disagreement region; labeled training data; limited training data; machine learning; selection by rejection; self labeled instances; self training algorithm; unlabeled data; Accuracy; Algorithm design and analysis; Distributed databases; Labeling; Noise; Semisupervised learning; Training; self-training; semi-supervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2012 IEEE 12th International Conference on
  • Conference_Location
    Brussels
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4673-4649-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2012.56
  • Filename
    6413850