Title :
Stroke-model-based character extraction from gray-level document images
Author :
Ye, Xiangyun ; Cheriet, Mohamed ; Suen, Ching Y.
Author_Institution :
Centre for Pattern Recognition & Machine Intelligence, Concordia Univ., Montreal, Que., Canada
fDate :
8/1/2001 12:00:00 AM
Abstract :
Global gray-level thresholding techniques such as Otsu´s method, and local gray-level thresholding techniques such as edge-based segmentation or the adaptive thresholding method are powerful in extracting character objects from simple or slowly varying backgrounds. However, they are found to be insufficient when the backgrounds include sharply varying contours or fonts in different sizes. A stroke-model is proposed to depict the local features of character objects as double-edges in a predefined size. This model enables us to detect thin connected components selectively, while ignoring relatively large backgrounds that appear complex. Meanwhile, since the stroke width restriction is fully factored in, the proposed technique can be used to extract characters in predefined font sizes. To process large volumes of documents efficiently, a hybrid method is proposed for character extraction from various backgrounds. Using the measurement of class separability to differentiate images with simple backgrounds from those with complex backgrounds, the hybrid method can process documents with different backgrounds by applying the appropriate methods. Experiments on extracting handwriting from a check image, as well as machine-printed characters from scene images demonstrate the effectiveness of the proposed model
Keywords :
adaptive signal processing; document image processing; feature extraction; adaptive thresholding method; character objects; check image; class separability measurement; contours; double-edges; edge-based segmentation; font size; global gray-level thresholding; gray-level document images; handwriting extraction; hybrid method; local features; local gray-level thresholding; machine-printed characters; scene images; simple backgrounds; slowly varying backgrounds; stroke width; stroke-model-based character extraction; thin connected components detection; Character recognition; Data mining; Image edge detection; Image recognition; Image segmentation; Indexing; Layout; Pixel; Text analysis; Web sites;
Journal_Title :
Image Processing, IEEE Transactions on