DocumentCode :
2011037
Title :
Scanning Neural Network for Text Line Recognition
Author :
Rashid, Sheikh Faisal ; Shafait, Faisal ; Breuel, Thomas M.
Author_Institution :
Dept. of Comput. Sci., Tech. Univ. Kaiserslautern, Kaiserslautern, Germany
fYear :
2012
fDate :
27-29 March 2012
Firstpage :
105
Lastpage :
109
Abstract :
Optical character recognition (OCR) of machine printed Latin script documents is ubiquitously claimed as a solved problem. However, error free OCR of degraded or noisy text is still challenging for modern OCR systems. Most recent approaches perform segmentation based character recognition. This is tricky because segmentation of degraded text is itself problematic. This paper describes a segmentation free text line recognition approach using multi layer perceptron (MLP) and hidden markov models (HMMs). A line scanning neural network-trained with character level contextual information and a special garbage class-is used to extract class probabilities at every pixel succession. The output of this scanning neural network is decoded by HMMs to provide character level recognition. In evaluations on a subset of UNLV-ISRI document collection, we achieve 98.4% character recognition accuracy that is statistically significantly better in comparison with character recognition accuracies obtained from state-of-the-art open source OCR systems.
Keywords :
feature extraction; hidden Markov models; image segmentation; learning (artificial intelligence); multilayer perceptrons; natural language processing; optical character recognition; probability; text detection; HMM; MLP; UNLV-ISRI document collection; character level contextual information; class probability extraction; degraded text segmentation; error free OCR; hidden Markov models; line scanning neural network training; machine printed Latin script documents; multilayer perceptron; noisy text; open source OCR systems; optical character recognition; pixel succession; segmentation based character recognition; segmentation free text line recognition; Accuracy; Character recognition; Feature extraction; Handwriting recognition; Hidden Markov models; Optical character recognition software; Text recognition; Auto MLP; Hidden Markov Models; Multilayer Perceptron; Optical Character Recognition; Scanning Neural Network; Segmentation free OCR;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
Conference_Location :
Gold Cost, QLD
Print_ISBN :
978-1-4673-0868-7
Type :
conf
DOI :
10.1109/DAS.2012.77
Filename :
6195344
Link To Document :
بازگشت