Title :
Adaptative Smart-Binarization Method: For Images of Business Documents
Author :
Gaceb, Djamel ; Lebourgeois, Frank ; Duong, Jean
Author_Institution :
INSA-Lyon, Univ. de Lyon, Villeurbanne, France
Abstract :
The automatic reading systems of business documents requires fast and accurate reading of interest zones using the OCR technology. The result quality of the binarization has a major impact on the quality of binary characters. We propose in this paper a smart-binarization method of the images of business documents. In our work, we considered different degradations on document images, real-time constraints and high spatial resolution of the images. The quality of each pixel is estimated using a hierarchical local thresholding in order to classify it as foreground, background or ambiguous pixel. The ambiguous pixels that represent the degraded zones cannot be binarized with the same local thresholding. The global quality of the image is thus estimated from the density of theses degraded pixels. If it is considered as degraded, we apply a second separation on the ambiguous pixels to separate them into background or foreground. This second process uses our improved relaxation method that we have accelerate for the first time to integrate it into a system of automatic reading document. Our approach, compared to existing binarization approaches (local or global), offers a better reading of characters by the OCR. The computation time remains constant with the variation of the local window size through the use of integral images. The method was developed in the context of DOD project (Documents On Demand) at the request of the ITESOFT company.
Keywords :
business data processing; character recognition; document image processing; image classification; image resolution; DOD project; OCR technology; ambiguous pixel classification; automatic reading systems; background pixel classification; binarization quality; business document image; documents-on-demand project; foreground pixel classification; hierarchical local thresholding; image resolution; improved relaxation method; optical character recognition; pixel quality; smart binarization method; Business; Entropy; Gray-scale; Histograms; Labeling; Optical character recognition software; Text analysis; Document Image Processin; Industrial application; Realaxation; Smart-binarization;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.31