• DocumentCode
    26041
  • Title

    Imbalanced Multiple Noisy Labeling

  • Author

    Jing Zhang ; Xindong Wu ; Sheng, Victor S.

  • Author_Institution
    Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei, China
  • Volume
    27
  • Issue
    2
  • fYear
    2015
  • fDate
    Feb. 1 2015
  • Firstpage
    489
  • Lastpage
    503
  • Abstract
    It can be easy to collect multiple noisy labels for the same object via Internet-based crowdsourcing systems. Labelers may have bias when labeling, due to lacking expertise, dedication, and personal preference. These cause Imbalanced Multiple Noisy Labeling. In most cases, we have no information about the labeling qualities of labelers and the underlying class distributions. It is important to design agnostic solutions to utilize these noisy labels for supervised learning. We first investigate how imbalanced multiple noisy labeling affects the class distributions of training sets and the performance of classification. Then, an agnostic algorithm Positive LAbel frequency Threshold (PLAT) is proposed to deal with the imbalanced labeling issue. Simulations on eight UCI data sets with different underlying class distributions show that PLAT not only effectively deals with the imbalanced multiple noisy labeling problems that off-the-shelf agnostic methods cannot cope with, but also performs nearly the same as majority voting under the circumstances without imbalance. We also apply PLAT to eight real-world data sets with imbalanced labels collected from Amazon Mechanical Turk, and the experimental results show that PLAT is efficient and better than other ground truth inference algorithms.
  • Keywords
    inference mechanisms; learning (artificial intelligence); Amazon Mechanical Turk; Internet-based crowd sourcing system; PLAT algorithm; UCI data set; ground truth inference algorithm; imbalanced multiple noisy labeling; label class distribution; labeling quality; positive label frequency threshold algorithm; supervised learning; Accuracy; Educational institutions; Labeling; Noise measurement; Supervised learning; Training; Imbalanced noisy labeling; crowdsourcing; imbalanced learning; repeated labeling;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2014.2327039
  • Filename
    6823124