DocumentCode
2307733
Title
Text extraction from degraded document images
Author
Hedjam, Rachid ; Moghaddam, Reza Farrahi ; Cheriet, Mohamed
Author_Institution
Synchromedia Lab. for Multimedia Commun. in Telepresence, Ecole de Technol. Supeerieure, Montréal, QC, Canada
fYear
2010
fDate
5-6 July 2010
Firstpage
247
Lastpage
252
Abstract
In this work, a robust segmentation method for text extraction from the historical document images is presented. The method is based on Markovian-Bayesian clustering on local graphs on both pixel and regional scales. It consists of three steps. In the first step, an over-segmented map of the input image is created. The resulting map provides a rich and accurate semi-mosaic fragments. The map is processed in the second step, similar and adjoining sub-regions are merged together to form accurate text shapes. The output of the second step, which contains accurate shapes, is processed in the final step in which, using clustering with fixed number of classes, the segmentation will be obtained. The method employs significantly the local and spatial correlation and coherence on both the image and between the stroke parts, and therefore is very robust with respect to the degradation. The resulting segmented text is smooth, and weak connections and loops are preserved thanks to robust nature of the method. The output can be used in succeeding skeletonization processes which require preservation of the text topology for achieving high performance. The method is tested on real degraded document images with promising results.
Keywords
Bayes methods; Markov processes; document image processing; feature extraction; image segmentation; image thinning; text analysis; Markovian-Bayesian clustering; degraded document image; historical document image; robust segmentation method; skeletonization process; text extraction; text topology; Document image; Graph-partitioning; Image binarization; Image segmentation; MRF;
fLanguage
English
Publisher
ieee
Conference_Titel
Visual Information Processing (EUVIP), 2010 2nd European Workshop on
Conference_Location
Paris
Print_ISBN
978-1-4244-7288-8
Type
conf
DOI
10.1109/EUVIP.2010.5699135
Filename
5699135
Link To Document