Title :
Improving OCR performance with background image elimination
Author :
Mande Shen; Hansheng Lei
Author_Institution :
School of Electronic and Electrical Engineering, Wuhan Textile University, Hubei 43000 China
Abstract :
One critical procedure in OCR is to detect text characters from a document image. However, some documents might come with embedded background images which often mislead the algorithms of character detection. For example, small dots or sharp edges from the background image are often bound-boxed as characters and passed to the next stage of the OCR pipeline, which causes an error chain. Motivated by this observation, we present a novel and cost-effective image preprocessing method to accomplish the task. We first enhance the document images before OCR by utilizing the brightness and chromaticity as contrast parameters. Then we convert color images to gray and threshold it. This way, background images can be removed effectively without losing the quality of text characters. The method was tested using Tesseract (an open source OCR engine) and compared with two commercial OCR software ABBYY Finereader and HANWANG (OCR software for Chinese characters). The experimental results show that the recognition accuracies are improved significantly after removing background images.
Keywords :
"Optical character recognition software","Brightness","Image color analysis","Measurement","Image edge detection","Distortion"
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on
DOI :
10.1109/FSKD.2015.7382178