Title :
A model guided document image analysis scheme
Author :
Harit, Gaurav ; Chaudhury, Santanu ; Gupta, P. ; Vohra, N. ; Joshi, S.D.
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol., New Delhi, India
fDate :
6/23/1905 12:00:00 AM
Abstract :
This paper presents a new model-based document image segmentation scheme that uses XML-DTDs (eXtensible Markup Language Document Type Definitions). Given a document image, the algorithm has the ability to select the appropriate model. A new wavelet-based tool has been designed for distinguishing text from non-text regions and characterization of font sizes. Our model-based analysis scheme makes use of this tool for identifying the logical components of a document image
Keywords :
document image processing; hypermedia markup languages; image segmentation; wavelet transforms; XML document type definition; document layout analysis; font size characterization; logical components identification; model-based document image segmentation scheme; model-guided document image analysis scheme; nontext regions; text regions; wavelet-based tool; Geometry; Graphics; Image analysis; Image segmentation; Layout; Shape; Text analysis;
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
DOI :
10.1109/ICDAR.2001.953963