• DocumentCode
    2011037
  • Title

    Scanning Neural Network for Text Line Recognition

  • Author

    Rashid, Sheikh Faisal ; Shafait, Faisal ; Breuel, Thomas M.

  • Author_Institution
    Dept. of Comput. Sci., Tech. Univ. Kaiserslautern, Kaiserslautern, Germany
  • fYear
    2012
  • fDate
    27-29 March 2012
  • Firstpage
    105
  • Lastpage
    109
  • Abstract
    Optical character recognition (OCR) of machine printed Latin script documents is ubiquitously claimed as a solved problem. However, error free OCR of degraded or noisy text is still challenging for modern OCR systems. Most recent approaches perform segmentation based character recognition. This is tricky because segmentation of degraded text is itself problematic. This paper describes a segmentation free text line recognition approach using multi layer perceptron (MLP) and hidden markov models (HMMs). A line scanning neural network-trained with character level contextual information and a special garbage class-is used to extract class probabilities at every pixel succession. The output of this scanning neural network is decoded by HMMs to provide character level recognition. In evaluations on a subset of UNLV-ISRI document collection, we achieve 98.4% character recognition accuracy that is statistically significantly better in comparison with character recognition accuracies obtained from state-of-the-art open source OCR systems.
  • Keywords
    feature extraction; hidden Markov models; image segmentation; learning (artificial intelligence); multilayer perceptrons; natural language processing; optical character recognition; probability; text detection; HMM; MLP; UNLV-ISRI document collection; character level contextual information; class probability extraction; degraded text segmentation; error free OCR; hidden Markov models; line scanning neural network training; machine printed Latin script documents; multilayer perceptron; noisy text; open source OCR systems; optical character recognition; pixel succession; segmentation based character recognition; segmentation free text line recognition; Accuracy; Character recognition; Feature extraction; Handwriting recognition; Hidden Markov models; Optical character recognition software; Text recognition; Auto MLP; Hidden Markov Models; Multilayer Perceptron; Optical Character Recognition; Scanning Neural Network; Segmentation free OCR;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
  • Conference_Location
    Gold Cost, QLD
  • Print_ISBN
    978-1-4673-0868-7
  • Type

    conf

  • DOI
    10.1109/DAS.2012.77
  • Filename
    6195344