DocumentCode :
1511187
Title :
Stroke-model-based character extraction from gray-level document images
Author :
Ye, Xiangyun ; Cheriet, Mohamed ; Suen, Ching Y.
Author_Institution :
Centre for Pattern Recognition & Machine Intelligence, Concordia Univ., Montreal, Que., Canada
Volume :
10
Issue :
8
fYear :
2001
fDate :
8/1/2001 12:00:00 AM
Firstpage :
1152
Lastpage :
1161
Abstract :
Global gray-level thresholding techniques such as Otsu´s method, and local gray-level thresholding techniques such as edge-based segmentation or the adaptive thresholding method are powerful in extracting character objects from simple or slowly varying backgrounds. However, they are found to be insufficient when the backgrounds include sharply varying contours or fonts in different sizes. A stroke-model is proposed to depict the local features of character objects as double-edges in a predefined size. This model enables us to detect thin connected components selectively, while ignoring relatively large backgrounds that appear complex. Meanwhile, since the stroke width restriction is fully factored in, the proposed technique can be used to extract characters in predefined font sizes. To process large volumes of documents efficiently, a hybrid method is proposed for character extraction from various backgrounds. Using the measurement of class separability to differentiate images with simple backgrounds from those with complex backgrounds, the hybrid method can process documents with different backgrounds by applying the appropriate methods. Experiments on extracting handwriting from a check image, as well as machine-printed characters from scene images demonstrate the effectiveness of the proposed model
Keywords :
adaptive signal processing; document image processing; feature extraction; adaptive thresholding method; character objects; check image; class separability measurement; contours; double-edges; edge-based segmentation; font size; global gray-level thresholding; gray-level document images; handwriting extraction; hybrid method; local features; local gray-level thresholding; machine-printed characters; scene images; simple backgrounds; slowly varying backgrounds; stroke width; stroke-model-based character extraction; thin connected components detection; Character recognition; Data mining; Image edge detection; Image recognition; Image segmentation; Indexing; Layout; Pixel; Text analysis; Web sites;
fLanguage :
English
Journal_Title :
Image Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1057-7149
Type :
jour
DOI :
10.1109/83.935031
Filename :
935031
Link To Document :
بازگشت