DocumentCode
2629196
Title
Page grammars and page parsing. A syntactic approach to document layout recognition
Author
Conway, Alan
Author_Institution
Hitachi Dublin Lab., Trinity Coll., Dublin, Ireland
fYear
1993
fDate
20-22 Oct 1993
Firstpage
761
Lastpage
764
Abstract
Describes a syntactic approach to deducing the logical structure of printed documents from their physical layout. Page layout is described by a two-dimensional grammar, similar to a context-free string grammar, and a chart parser is used to parse segmented page images according to the grammar. This process is part of a system which reads scanned document images and produces computer-readable text in a logical mark-up format such as SGML. The system is briefly outlined, the grammar formalism and the parsing algorithm are described in detail, and some experimental results are reported
Keywords
context-free grammars; document image processing; image recognition; page description languages; 2D grammar; SGML; chart parser; computer-readable text; context-free string grammar; document layout recognition; logical document structure deduction; logical mark-up format; page grammars; page layout; page parsing; scanned document images; segmented page images; syntactic approach; Character recognition; Educational institutions; Graphics; Image segmentation; Indexing; Laboratories; Layout; SGML; Text recognition; Tree graphs;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location
Tsukuba Science City
Print_ISBN
0-8186-4960-7
Type
conf
DOI
10.1109/ICDAR.1993.395626
Filename
395626
Link To Document