• DocumentCode
    3307580
  • Title

    Page segmentation and classification utilising a bottom-up approach

  • Author

    Drivas, Dimitrios ; Amin, Adnan

  • Author_Institution
    Sch. of Comput. Sci. & Eng., New South Wales Univ., Kensington, NSW, Australia
  • Volume
    2
  • fYear
    1995
  • fDate
    14-16 Aug 1995
  • Firstpage
    610
  • Abstract
    This paper presents the use of analysing the connected components extracted from the binary image of a document page. Such an analysis provides a lot of useful information, and will be used to perform skew correction, segmentation and classification of the document. We present a new algorithm for determining the skew angle of lines of text in an image of a document with the advantage that it only performs one iteration to determine the skew angle. Experiments on over 30 pages show that the method works well on a wide variety of layouts, including sparse textual regions, mixed fonts, multiple columns, and even for documents with a high graphical content
  • Keywords
    character sets; document image processing; image classification; image segmentation; optical character recognition; OCR; binary image analysis; bottom-up approach; document image processing; document page analysis; experiments; graphics; layouts; mixed fonts; multiple columns; page classification; page segmentation; skew angle; skew correction; sparse textual regions; text; Australia; Computer science; Data mining; Detection algorithms; Graphics; Image analysis; Image segmentation; Information analysis; Layout; Optical character recognition software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
  • Conference_Location
    Montreal, Que.
  • Print_ISBN
    0-8186-7128-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.1995.601970
  • Filename
    601970