• DocumentCode
    3058186
  • Title

    Page segmentation without rectangle assumption

  • Author

    Saitoh, Takashi ; Pavlidis, Theo

  • Author_Institution
    Ricoh R&D Center, Kanagawa, Japan
  • fYear
    1992
  • fDate
    30 Aug-3 Sep 1992
  • Firstpage
    277
  • Lastpage
    280
  • Abstract
    A new technique for page segmentation without skew normalization is described and applied to both English and Japanese complex printed-page layouts. There is no need to make any assumption about the shape of blocks, hence the technique can handle not only skewed pages but it can also be extended to handle documents where columns are not rectangles. In this technique, based on the bottom-up strategy, the connected components are extracted on the reduced image and are classified with their local information. Since the skew angle is also estimated with the local information of blocks, the computational time is very short. Merging text blocks into string lines and into columns is performed with the skew information
  • Keywords
    document image processing; image segmentation; optical character recognition; OCR preprocessing; block extraction; bottom-up strategy; page segmentation; printed-page layouts; skew angle; skewed pages; string lines; Aggregates; Computer science; Data mining; Image segmentation; Optical character recognition software; Pixel; Research and development; Shape; Streaming media; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 1992. Vol.II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on
  • Conference_Location
    The Hague
  • Print_ISBN
    0-8186-2915-0
  • Type

    conf

  • DOI
    10.1109/ICPR.1992.201772
  • Filename
    201772