DocumentCode
3449296
Title
A novel method to extract text from compound document images
Author
Song, Huaibo ; He, Dongjian
Author_Institution
Coll. of Mech. & Electron. Eng., Northwest A&F Univ., Yangling, China
Volume
2
fYear
2010
fDate
29-31 Oct. 2010
Firstpage
143
Lastpage
146
Abstract
To separate the text embedded in colored and/or complex backgrounds, a novel segmentation algorithm to separate the text from the image in a complicated document in which the text overlaps the background was presented, and this work could be seen as a new view to realize the multi-thresholding segmentation method. In the first step, the curve fitting using least square method was carried out to fit the image histogram; in the second step, the image was split into several layers including text layers and background layers. These layers were merged by some given rules to simplify the image processing period; at last, all the text layers were processed using different techniques to pick up the text document successfully. Experiments were carried out with large number of such images and it shows that the proposed method outperforms the common used segmentation methods and has preferable applicability.
Keywords
curve fitting; document image processing; feature extraction; image segmentation; least squares approximations; text analysis; background layers; compound document image; curve fitting; image histogram; image processing; least square method; multithresholding segmentation method; text extraction method; text layers; Image color analysis; Image edge detection; Image segmentation; complicated color document image; curve fitting; image layers; image segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Computing and Intelligent Systems (ICIS), 2010 IEEE International Conference on
Conference_Location
Xiamen
Print_ISBN
978-1-4244-6582-8
Type
conf
DOI
10.1109/ICICISYS.2010.5658780
Filename
5658780
Link To Document