• DocumentCode
    3695099
  • Title

    Classifier self-assessment: active learning and active noise correction for document classification

  • Author

    Dominik Henter;Armin Stahl;Markus Ebbecke;Michael Gillmann

  • Author_Institution
    University of Kaiserslautern, Germany
  • fYear
    2015
  • Firstpage
    276
  • Lastpage
    280
  • Abstract
    This paper introduces two novel techniques that improve document classification while reducing the amount of manual work by the user. The first technique applies uncertainty sampling as a metric for batch-mode active learning to suggest only the most interesting documents for the manual labeling process, resulting in a steep improvement even for small training sets. This addresses the problem of creating and improving an initial training set. The second technique focuses on cleaning an existing large set of weakly labeled documents by active noise correction. The classifier´s self-assessment is used to detect mislabeled documents which are then reclassified. For active noise correction, two approaches are explored: one based on a human expert and one that automatically corrects the assigned labels.
  • Keywords
    "Integrated circuits","Training"
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
  • Type

    conf

  • DOI
    10.1109/ICDAR.2015.7333767
  • Filename
    7333767