Title :
Label noise correction methods
Author :
Bryce Nicholson;Jing Zhang;Victor S. Sheng;Zhiheng Wang
Author_Institution :
Computer Science Department, University of Central Arkansas
Abstract :
The important task of correcting label noise is addressed infrequently in literature. The difficulty of developing a robust label correction algorithm leads to this silence concerning label correction. To break the silence, we propose two algorithms to correct label noise. One utilizes self-training to re-label noise, called Self-Training Correction (STC). Another is a clustering-based method, which groups instances together to infer their ground-truth labels, called Cluster-based Correction (CC). We also adapt an algorithm from previous work, a consensus-based method called Polishing that consults with an ensemble of classifiers to change the values of attributes and labels. We simplify Polishing such that it only alters labels of instances, and call it Polishing Labels (PL). We experimentally compare our novel methods with Polishing Labels by examining their improvements on the label qualities, model qualities, and AUC metrics of binary and multi-class data sets, and ultimately conclude that only CC can significantly improve label qualities, model qualities, and AUC metrics consistently. STC and PL can improve these metrics in some cases, but not as reliably. Hence, our Cluster-based Correction method is the best.
Keywords :
"Clustering algorithms","Noise measurement","Diversity reception","Data models","Training","Yttrium"
Conference_Titel :
Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on
Print_ISBN :
978-1-4673-8272-4
DOI :
10.1109/DSAA.2015.7344791