DocumentCode :
1796120
Title :
Color segmentation for historical documents using Markov random fields
Author :
Pantke, Werner ; Haak, Arne ; Margner, Volker
Author_Institution :
Inst. for Commun. Technol., Tech. Univ. Braunschweig, Braunschweig, Germany
fYear :
2014
fDate :
11-14 Aug. 2014
Firstpage :
151
Lastpage :
156
Abstract :
Binarization is often used for pixel-wise document text extraction as preprocessing step for scanned historical documents. These documents are scanned in color and high resolution today. The reduction of color to grayscale images and the subsequent binarization implies a loss of information and often results in unsatisfying processing results. In this paper, a color segmentation instead of a binarization approach is used to segment text from background in historical manuscripts. A color segmentation approach based on Markov random fields with a reduced set of required parameters is presented to segment text written in different colors from noisy page background. First tests with historical Arabic manuscripts show promising results. In case of words written in light red color, our approach shows better results than a state-of-the-art binarization approach.
Keywords :
Markov processes; document image processing; history; image colour analysis; image resolution; image segmentation; text detection; Markov random fields; binarization approach; color segmentation approach; grayscale images; historical Arabic manuscripts; light red color; noisy page background; pixel-wise document text extraction; scanned historical documents; text segmentation; Cooling; Covariance matrices; Image color analysis; Image segmentation; Markov processes; Simulated annealing; Vectors; Markov random fields; binarization; color segmentation; historical documents; text segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of
Conference_Location :
Tunis
Type :
conf
DOI :
10.1109/SOCPAR.2014.7007997
Filename :
7007997
Link To Document :
بازگشت