DocumentCode
2011037
Title
Scanning Neural Network for Text Line Recognition
Author
Rashid, Sheikh Faisal ; Shafait, Faisal ; Breuel, Thomas M.
Author_Institution
Dept. of Comput. Sci., Tech. Univ. Kaiserslautern, Kaiserslautern, Germany
fYear
2012
fDate
27-29 March 2012
Firstpage
105
Lastpage
109
Abstract
Optical character recognition (OCR) of machine printed Latin script documents is ubiquitously claimed as a solved problem. However, error free OCR of degraded or noisy text is still challenging for modern OCR systems. Most recent approaches perform segmentation based character recognition. This is tricky because segmentation of degraded text is itself problematic. This paper describes a segmentation free text line recognition approach using multi layer perceptron (MLP) and hidden markov models (HMMs). A line scanning neural network-trained with character level contextual information and a special garbage class-is used to extract class probabilities at every pixel succession. The output of this scanning neural network is decoded by HMMs to provide character level recognition. In evaluations on a subset of UNLV-ISRI document collection, we achieve 98.4% character recognition accuracy that is statistically significantly better in comparison with character recognition accuracies obtained from state-of-the-art open source OCR systems.
Keywords
feature extraction; hidden Markov models; image segmentation; learning (artificial intelligence); multilayer perceptrons; natural language processing; optical character recognition; probability; text detection; HMM; MLP; UNLV-ISRI document collection; character level contextual information; class probability extraction; degraded text segmentation; error free OCR; hidden Markov models; line scanning neural network training; machine printed Latin script documents; multilayer perceptron; noisy text; open source OCR systems; optical character recognition; pixel succession; segmentation based character recognition; segmentation free text line recognition; Accuracy; Character recognition; Feature extraction; Handwriting recognition; Hidden Markov models; Optical character recognition software; Text recognition; Auto MLP; Hidden Markov Models; Multilayer Perceptron; Optical Character Recognition; Scanning Neural Network; Segmentation free OCR;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
Conference_Location
Gold Cost, QLD
Print_ISBN
978-1-4673-0868-7
Type
conf
DOI
10.1109/DAS.2012.77
Filename
6195344
Link To Document