• DocumentCode
    855001
  • Title

    Learning from labeled and unlabeled data using a minimal number of queries

  • Author

    Kothari, Ravi ; Jain, Vivek

  • Author_Institution
    IBM India Res. Lab., Indian Inst. of Technol., Hauz Khas, India
  • Volume
    14
  • Issue
    6
  • fYear
    2003
  • Firstpage
    1496
  • Lastpage
    1505
  • Abstract
    The considerable time and expense required for labeling data has prompted the development of algorithms which maximize the classification accuracy for a given amount of labeling effort. On the one hand, the effort has been to develop the so-called "active learning" algorithms which sequentially choose the patterns to be explicitly labeled so as to realize the maximum information gain from each labeling. On the other hand, the effort has been to develop algorithms that can learn from labeled as well as the more abundant unlabeled data. Proposed in this paper is an algorithm that integrates the benefits of active learning with the benefits of learning from labeled and unlabeled data. Our approach is based on reversing the roles of the labeled and unlabeled data. Specifically, we use a Genetic Algorithm (GA) to iteratively refine the class membership of the unlabeled patterns so that the maximum a posteriori (MAP) based predicted labels of the patterns in the labeled dataset are in agreement with the known labels. This reversal of the role of labeled and unlabeled patterns leads to an implicit class assignment of the unlabeled patterns. For active learning, we use a subset of the GA population to construct multiple MAP classifiers. Points in the input space where there is maximal disagreement amongst these classifiers are then selected for explicit labeling. The learning from labeled and unlabeled data and active learning phases are interlaced and together provide accurate classification while minimizing the labeling effort.
  • Keywords
    genetic algorithms; iterative methods; learning (artificial intelligence); maximum likelihood estimation; query processing; active learning algorithm; active learning phase; data querying; expectation-minimization; genetic algorithm; maximum a posteriori; query number; supervised learning; unlabeled data; Error analysis; Genetic algorithms; Geometry; Iterative algorithms; Labeling; Supervised learning; Unsupervised learning;
  • fLanguage
    English
  • Journal_Title
    Neural Networks, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9227
  • Type

    jour

  • DOI
    10.1109/TNN.2003.820446
  • Filename
    1257412