• DocumentCode
    27613
  • Title

    Active Learning With Imbalanced Multiple Noisy Labeling

  • Author

    Jing Zhang ; Xindong Wu ; Shengs, Victor S.

  • Author_Institution
    Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei, China
  • Volume
    45
  • Issue
    5
  • fYear
    2015
  • fDate
    May-15
  • Firstpage
    1081
  • Lastpage
    1093
  • Abstract
    With crowdsourcing systems, it is easy to collect multiple noisy labels for the same object for supervised learning. This dynamic annotation procedure fits the active learning perspective and accompanies the imbalanced multiple noisy labeling problem. This paper proposes a novel active learning framework with multiple imperfect annotators involved in crowdsourcing systems. The framework contains two core procedures: label integration and instance selection. In the label integration procedure, a positive label threshold (PLAT) algorithm is introduced to induce the class membership from the multiple noisy label set of each instance in a training set. PLAT solves the imbalanced labeling problem by dynamically adjusting the threshold for determining the class membership of an example. Furthermore, three novel instance selection strategies are proposed to adapt PLAT for improving the learning performance. These strategies are respectively based on the uncertainty derived from the multiple labels, the uncertainty derived from the learned model, and the combination method (CFI). Experimental results on 12 datasets with different underlying class distributions demonstrate that the three novel instance selection strategies significantly improve the learning performance, and CFI has the best performance when labeling behaviors exhibit different levels of imbalance in crowdsourcing systems. We also apply our methods to a real-world scenario, obtaining noisy labels from Amazon Mechanical Turk, and show that our proposed strategies achieve very high performance.
  • Keywords
    data integration; learning (artificial intelligence); pattern classification; Amazon Mechanical Turk; CFI; PLAT algorithm; active learning framework; class distributions; class membership; combination method; crowdsourcing systems; dynamic annotation procedure; imbalanced multiple noisy labeling; instance selection strategies; label integration; learning performance; multiple noisy label set; positive label threshold algorithm; supervised learning; training set; Accuracy; Crowdsourcing; Labeling; Measurement uncertainty; Noise measurement; Training; Uncertainty; Active learning; crowdsourcing; imbalanced learning; repeated labeling; supervised classification;
  • fLanguage
    English
  • Journal_Title
    Cybernetics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-2267
  • Type

    jour

  • DOI
    10.1109/TCYB.2014.2344674
  • Filename
    6878424