• DocumentCode
    2143697
  • Title

    Sample-Dependent Feature Selection for Faster Document Image Categorization

  • Author

    Louradour, Jérôme ; Kermorvant, Christopher

  • Author_Institution
    A2iA, Paris, France
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    309
  • Lastpage
    313
  • Abstract
    In document image classification, some classes of documents can be easily identified using pixel-level features, whereas some distinctions can only be made using semantics, which usually involves a full automatic text transcription. To be as much efficient as possible, the classification system should be able to avoid extracting high-level and time consuming features when they are not necessary to classify with confidence. We introduce here this issue of sample-dependent feature selection, which has not been addressed before as far as we know. We propose a method to tackle this problem, that can be generalized to any classifier that provides a confidence score along with its prediction. Empirical results using AdaBoost on three mail classification problems show that our approach allows to significantly improve classification efficiency (up to 40% CPU time off) without significant loss of accuracy in comparison to the baseline.
  • Keywords
    document image processing; feature extraction; image classification; learning (artificial intelligence); text analysis; AdaBoost; automatic text transcription; document image categorization; document image classification; pixel-level features; sample-dependent feature selection; Accuracy; Calibration; Databases; Error analysis; Estimation; Feature extraction; Machine learning; Image document classification; confidence-rated multi-label classification; feature selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.70
  • Filename
    6065325