• DocumentCode
    2629196
  • Title

    Page grammars and page parsing. A syntactic approach to document layout recognition

  • Author

    Conway, Alan

  • Author_Institution
    Hitachi Dublin Lab., Trinity Coll., Dublin, Ireland
  • fYear
    1993
  • fDate
    20-22 Oct 1993
  • Firstpage
    761
  • Lastpage
    764
  • Abstract
    Describes a syntactic approach to deducing the logical structure of printed documents from their physical layout. Page layout is described by a two-dimensional grammar, similar to a context-free string grammar, and a chart parser is used to parse segmented page images according to the grammar. This process is part of a system which reads scanned document images and produces computer-readable text in a logical mark-up format such as SGML. The system is briefly outlined, the grammar formalism and the parsing algorithm are described in detail, and some experimental results are reported
  • Keywords
    context-free grammars; document image processing; image recognition; page description languages; 2D grammar; SGML; chart parser; computer-readable text; context-free string grammar; document layout recognition; logical document structure deduction; logical mark-up format; page grammars; page layout; page parsing; scanned document images; segmented page images; syntactic approach; Character recognition; Educational institutions; Graphics; Image segmentation; Indexing; Laboratories; Layout; SGML; Text recognition; Tree graphs;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
  • Conference_Location
    Tsukuba Science City
  • Print_ISBN
    0-8186-4960-7
  • Type

    conf

  • DOI
    10.1109/ICDAR.1993.395626
  • Filename
    395626