Title :
A threshold method for Imbalanced Multiple Noisy Labeling
Author :
Jing Zhang ; Xindong Wu ; Sheng, Victor S.
Author_Institution :
Dept. of Comput. Sci., Hefei Univ. of Technol., Hefei, China
Abstract :
Internet-based crowdsourcing systems can be viewed as a kind of loosely coupled social networks. With these systems, it is easy to collect multiple noisy labels for the same object when conducting annotation for supervised learning. Because non-expert labelers lack expertise and dedication, and have strong personal preference, they may have bias when labeling. These cause Imbalanced Multiple Noisy Labeling. In this paper, we propose an agnostic algorithm Positive LAbel frequency Threshold (PLAT) to deal with imbalanced labeling. Because of the dynamics of social networks, in most cases no information about the qualities of labelers and underlying class distributions can be acquired. PLAT does not require prior knowledge of the labeling qualities of labelers, the underlying class distributions, and the level of labeling imbalance. Simulations on eight real-world datasets with different underlying class distributions demonstrate that PLAT not only effectively deals with the imbalanced multiple noisy labeling that off-the-shelf agnostic methods cannot cope with, but also performs nearly the same as majority voting under the circumstances that labelers have no bias.
Keywords :
Internet; learning (artificial intelligence); social networking (online); Internet-based crowdsourcing systems; PLAT; class distributions; imbalanced multiple noisy labeling; positive label frequency threshold method; social network dynamics; supervised learning; Accuracy; Conferences; Data mining; Labeling; Noise measurement; Social network services; Training; classification; crowdsourcing; imbalance labeling; multiple noisy labels; outsourcing;
Conference_Titel :
Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on
Conference_Location :
Niagara Falls, ON