Title :
Active Label Correction
Author :
Rebbapragada, Umaa ; Brodley, Carla E. ; Sulla-Menashe, Damien ; Friedl, M.A.
Author_Institution :
Jet Propulsion Lab., California Inst. of Technol., Pasadena, CA, USA
Abstract :
Active Label Correction (ALC) is an interactive method that cleans an established training set of mislabeled examples in conjunction with a domain expert. ALC presumes that the expert who conducts this review is either more accurate than the original annotator or has access to additional resources that ensure a high quality label. A high-cost re-review is possible because ALC proceeds iteratively, scoring the full training set but selecting only small batches of examples that are likely mislabeled. The expert reviews each batch and corrects any mislabeled examples, after which the classifier is retrained and the process repeats until the expert terminates it. We compare several instantiations of ALC to fully-automated methods that attempt to discard or correct label noise in a single pass. Our empirical results show that ALC outperforms single-pass methods in terms of selection efficiency and classifier accuracy. We evaluate the best ALC instantiation on our motivating task of detecting mislabeled and poorly formulated sites within a land cover classification training set from the geography domain.
Keywords :
geography; learning (artificial intelligence); pattern classification; terrain mapping; ALC method; active label correction; classifier accuracy; classifier retraining; domain expert; geography domain; land cover classification training; selection efficiency; single-pass method; Accuracy; Labeling; Noise; Noise level; Training; Training data; Uncertainty; data cleaning; label noise; land cover classification; supervised learning;
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-4649-8
DOI :
10.1109/ICDM.2012.162