• DocumentCode
    2631828
  • Title

    A block segmentation method for document images with complicated column structures

  • Author

    Hirayama, Yuki

  • Author_Institution
    IBM Japan Ltd., Yamato city, Kanagawa, Japan
  • fYear
    1993
  • fDate
    20-22 Oct 1993
  • Firstpage
    91
  • Lastpage
    94
  • Abstract
    Presents a novel block segmentation method for document images which can be applied to various document formats. Some documents have complicated column structures, in which some figure areas have no surrounding rectangles and others cut across text areas. In the approach presented, in order to segment documents into text and figure areas, the text areas are analyzed first, and the figure areas are then detected by analyzing information on the text areas. The overall process is as follows. First, character strings are merged into text groups by analyzing regularity in the text areas. Next, border lines of columns are detected by linking the edges of the text groups. After that, the whole page is segmented into small blocks according to the border lines. The blocks are then unified by using the column information, and some unified blocks are detected. Finally, a projection profile method is applied to the unified blocks in order to detect text areas and figure areas. This method was applied to 61 pages of Japanese technical papers and magazines, and 93.3% of the text areas and 93.2% of the figure areas were detected correctly
  • Keywords
    document image processing; image segmentation; merging; Japanese magazines; Japanese technical papers; block segmentation method; block unification; border lines; character strings; complicated column structures; document formats; document images; figure areas; page segmentation; projection profile method; regularity; text areas; text groups; Cities and towns; Databases; Image analysis; Image edge detection; Image segmentation; Information analysis; Joining processes; Laboratories; Publishing; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
  • Conference_Location
    Tsukuba Science City
  • Print_ISBN
    0-8186-4960-7
  • Type

    conf

  • DOI
    10.1109/ICDAR.1993.395775
  • Filename
    395775