• DocumentCode
    3594008
  • Title

    Near-wordless document structure classification

  • Author

    Summers, Kristen

  • Volume
    1
  • fYear
    1995
  • Firstpage
    462
  • Abstract
    Automatic derivation of logical document structure from generic layout would enable the development of many highly flexible electronic document manipulation tools. This problem can be divided into the segmentation of text into pieces and the classification of these pieces as particular logical structures. This paper proposes an approach to the classification of logical document structures, according to their distance from predefined prototypes. The prototypes consider linguistic information minimally, thus relying minimally on the accuracy of OCR and decreasing language-dependence. Different classes of logical structures and the differences in the requisite information for classifying them are discussed. A prototype format is proposed, existing prototypes and a distance measurement are described, and performance results are provided
  • Keywords
    document handling; document image processing; pattern recognition; OCR; document structure classification; electronic document manipulation tools; language-dependence; logical document structure; prototype format; segmentation of text; Adders; Distance measurement; Graphics; Marine vehicles; Optical character recognition software; Prototypes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
  • Print_ISBN
    0-8186-7128-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.1995.599036
  • Filename
    599036