• DocumentCode
    679523
  • Title

    Maximizing Expected Model Change for Active Learning in Regression

  • Author

    Wenbin Cai ; Ya Zhang ; Jun Zhou

  • Author_Institution
    Shanghai Key Lab. of Multimedia Process. & Transmissions, Shanghai Jiao Tong Univ., Shanghai, China
  • fYear
    2013
  • fDate
    7-10 Dec. 2013
  • Firstpage
    51
  • Lastpage
    60
  • Abstract
    Active learning is well-motivated in many supervised learning tasks where unlabeled data may be abundant but labeled examples are expensive to obtain. The goal of active learning is to maximize the performance of a learning model using as few labeled training data as possible, thereby minimizing the cost of data annotation. So far, there is still very limited work on active learning for regression. In this paper, we propose a new active learning framework for regression called Expected Model Change Maximization (EMCM), which aims to choose the examples that lead to the largest change to the current model. The model change is measured as the difference between the current model parameters and the updated parameters after training with the enlarged training set. Inspired by the Stochastic Gradient Descent (SGD) update rule, the change is estimated as the gradient of the loss with respect to a candidate example for active learning. Under this framework, we derive novel active learning algorithms for both linear regression and nonlinear regression to select the most informative examples. Extensive experimental results on the benchmark data sets from UCI machine learning repository have demonstrated that the proposed algorithms are highly effective in choosing the most informative examples and robust to various types of data distributions.
  • Keywords
    expectation-maximisation algorithm; gradient methods; learning (artificial intelligence); regression analysis; stochastic processes; EMCM; SGD update rule; UCI machine learning repository; active learning algorithms; active learning framework; data annotation; data distributions; enlarged training set; expected model change maximization; labeled training data; nonlinear regression; stochastic gradient descent update rule; supervised learning; unlabeled data; Current measurement; Data models; Linear regression; Machine learning algorithms; Regression tree analysis; Training; Training data; Active learning; Expected Model Change Maximization; Linear Regression; Nonlinear regression;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2013 IEEE 13th International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2013.104
  • Filename
    6729489