• DocumentCode
    2060541
  • Title

    Form processing based on background region analysis

  • Author

    Arai, Hiroyuki ; Odaka, Kazumi

  • Author_Institution
    NTT Human Interface Labs., Kanagawa, Japan
  • Volume
    1
  • fYear
    1997
  • fDate
    18-20 Aug 1997
  • Firstpage
    164
  • Abstract
    We present a novel approach for processing form documents based on background region analysis. Our goal is to achieve line-property-free form processing. Background regions can be extracted independently of line width or length, and multi-layer analysis employing a series of coarse-to-fine background images makes it possible to extract background regions regardless of small line-breaks. We propose two multi-layer analysis algorithms for different situations. One is applied in a registration process of a form model. It reliably extracts box regions from un-filled forms without using any model. The other is applied in a character extraction process. By using a spatial model of a form, it reliably extracts background regions, and re-integrates these regions if they are divided by characters written in the boxes. From these re-integrated regions, the exact locations of the character boxes are determined on the input image. Besides these algorithms, we present a form identification method that uses coarse background images. We implemented the algorithms into a prototype system that processes pre-printed forms. 50 types of existing forms were tested without any customization. Model registration, character extraction, and form identification were reliably carried out
  • Keywords
    document image processing; feature extraction; image registration; image segmentation; optical character recognition; OCR technologies; background region analysis; background region extraction; character extraction; coarse background images; form identification method; form processing; line-property-free form processing; multilayer analysis algorithms; re-integrated regions; registration process; Algorithm design and analysis; Data mining; Humans; Image analysis; Laboratories; Optical character recognition software; Prototypes; System testing; Telegraphy; Telephony;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
  • Conference_Location
    Ulm
  • Print_ISBN
    0-8186-7898-4
  • Type

    conf

  • DOI
    10.1109/ICDAR.1997.619834
  • Filename
    619834