DocumentCode :
2079367
Title :
Document image understanding: geometric and logical layout
Author :
Haralick, Robert M.
Author_Institution :
Dept. of Electr. Eng., Washington Univ., Seattle, WA, USA
fYear :
1994
fDate :
21-23 Jun 1994
Firstpage :
385
Lastpage :
390
Abstract :
Document image understanding encompasses the technology required to make paper documents equivalent to other computer exchange media like floppies, tapes, and CDROMs. The physical reader of the paper document is the scanner just like the physical reader of the floppy is the floppy drive and the physical reader of the tape cartridge is the tape cartridge drive, and the physical reader of the CDROM is the CDROM drive. In the survey presented, we restrict ourselves to documents such as business letters, forms, and scientific and technical articles such as those found in archival journals and technical conferences. Understanding such documents involves estimating the rotation skew of each document page, determining the geometric page layout, labeling blocks as text or non-text, determining the read order for text blocks, recognizing the text of text blocks through an OCR system, determining the logical page layout, and formatting the data and information of the document in a suitable way for use by a word processing system or by an information retrieval system
Keywords :
character recognition; computational geometry; document handling; document image processing; image processing; word processing; CDROMs; OCR system; business letters; computer exchange media; data formatting; document image understanding; document page; floppies; geometric page layout; information retrieval system; logical layout; logical page layout; non-text; read order; rotation skew; scanner; technical articles; text blocks; word processing system; Character recognition; Computational geometry; Document handling; Image processing; Text processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE Computer Society Conference on
Conference_Location :
Seattle, WA
ISSN :
1063-6919
Print_ISBN :
0-8186-5825-8
Type :
conf
DOI :
10.1109/CVPR.1994.323855
Filename :
323855
Link To Document :
بازگشت