• DocumentCode
    2510965
  • Title

    The PAGE (Page Analysis and Ground-Truth Elements) Format Framework

  • Author

    Pletschacher, S. ; Antonacopoulos, A.

  • Author_Institution
    Pattern Recognition & Image Anal. (PRImA) Res. Lab., Univ. of Salford, Salford, UK
  • fYear
    2010
  • fDate
    23-26 Aug. 2010
  • Firstpage
    257
  • Lastpage
    260
  • Abstract
    There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.
  • Keywords
    document image processing; image enhancement; image representation; image segmentation; optical character recognition; ICDAR page segmentation competition series; OCR; PAGE; XML-based page image representation framework; binarisation; corresponding corrections; document image analysis; document image enhancement; document representation formats; format framework; geometric distortions; ground-truth elements; image borders; image characteristics; layout analysis; page analysis; Joining processes; Layout; Optical character recognition software; Performance evaluation; Pipelines; Text analysis; XML; Document Analysis; Page Representation Formats; Performance Evaluation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2010 20th International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4244-7542-1
  • Type

    conf

  • DOI
    10.1109/ICPR.2010.72
  • Filename
    5597587