DocumentCode
3186966
Title
A knowledge-based approach for textual information extraction from mixed text/graphics complex document images
Author
Chen, Yen-Lin
Author_Institution
Dept. of Comput. Sci. & Inf. Eng., Nat. Taipei Univ. of Technol., Taipei, Taiwan
fYear
2010
fDate
10-13 Oct. 2010
Firstpage
3270
Lastpage
3277
Abstract
A new knowledge-based technique for extracting and identifying text-lines from various real-life mixed text/graphics complex document images is presented in this paper. The proposed technique first decompose the document image into distinct object planes to separate homogeneous objects including textual regions of interest, non-text objects such as graphics and pictures, and background textures. Then a knowledge-based text extraction and identification method is performed on the resultant planes to obtain text-lines with different characteristics in each plane. This proposed system can offer high flexibility and expandability by just updating new rules for coping with more various types of real-life and future complex document images. From the experimental and comparative results, the proposed knowledge-based technique demonstrates its effectiveness and advantages on extracting text-lines with various illuminations, sizes, and font styles from various types of mixed text/graphics complex document images.
Keywords
computer graphics; document handling; document image processing; information retrieval; knowledge based systems; text analysis; homogeneous objects; knowledge based approach; mixed text-graphics complex document image; nontext object; text lines identifyication; textual information extraction; textual region; Image segmentation; Document analysis; complex document images; knowledge-based systems; region segmentation; text extraction;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems Man and Cybernetics (SMC), 2010 IEEE International Conference on
Conference_Location
Istanbul
ISSN
1062-922X
Print_ISBN
978-1-4244-6586-6
Type
conf
DOI
10.1109/ICSMC.2010.5642309
Filename
5642309
Link To Document