DocumentCode
3695099
Title
Classifier self-assessment: active learning and active noise correction for document classification
Author
Dominik Henter;Armin Stahl;Markus Ebbecke;Michael Gillmann
Author_Institution
University of Kaiserslautern, Germany
fYear
2015
Firstpage
276
Lastpage
280
Abstract
This paper introduces two novel techniques that improve document classification while reducing the amount of manual work by the user. The first technique applies uncertainty sampling as a metric for batch-mode active learning to suggest only the most interesting documents for the manual labeling process, resulting in a steep improvement even for small training sets. This addresses the problem of creating and improving an initial training set. The second technique focuses on cleaning an existing large set of weakly labeled documents by active noise correction. The classifier´s self-assessment is used to detect mislabeled documents which are then reclassified. For active noise correction, two approaches are explored: one based on a human expert and one that automatically corrects the assigned labels.
Keywords
"Integrated circuits","Training"
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
Type
conf
DOI
10.1109/ICDAR.2015.7333767
Filename
7333767
Link To Document