DocumentCode :
3059431
Title :
An OCR-independent character segmentation using shortest-path in grayscale document images
Author :
Tse, Jia ; Curtis, Dean ; Jones, Christopher ; Yfantis, Evangelos
Author_Institution :
Univ. of Nevada, Las Vegas
fYear :
2007
fDate :
13-15 Dec. 2007
Firstpage :
142
Lastpage :
147
Abstract :
An optical character recognition (OCR) system with a high recognition rate is challenging to develop. One of the major contributors to OCR errors is smeared characters. Several factors lead to the smearing of characters such as bad scanning quality and a poor binarization technique. Typical approaches to character segmentation falls into three major categories: image-based, recognition-based, and holistic-based. Among these approaches, the segmentation path can be linear or non-linear. Our paper proposes a non-linear approach to segment characters on grayscale document images. Our method first determines whether characters are smeared together using general character features. The correct segmentation path is found using a shortest path approach. We achieved a segmentation accuracy of 95% over a set of about 2,000 smeared characters.
Keywords :
document image processing; image segmentation; optical character recognition; OCR-independent character segmentation; grayscale document image; holistic-based approach; image-based approach; nonlinear approach; optical character recognition; recognition-based approach; shortest path approach; Application software; Character recognition; Gray-scale; Heart; Image recognition; Image segmentation; Machine learning; Nonlinear optics; Optical character recognition software; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on
Conference_Location :
Cincinnati, OH
Print_ISBN :
978-0-7695-3069-7
Type :
conf
DOI :
10.1109/ICMLA.2007.21
Filename :
4457222
Link To Document :
بازگشت