Title :
Form processing based on background region analysis
Author :
Arai, Hiroyuki ; Odaka, Kazumi
Author_Institution :
NTT Human Interface Labs., Kanagawa, Japan
Abstract :
We present a novel approach for processing form documents based on background region analysis. Our goal is to achieve line-property-free form processing. Background regions can be extracted independently of line width or length, and multi-layer analysis employing a series of coarse-to-fine background images makes it possible to extract background regions regardless of small line-breaks. We propose two multi-layer analysis algorithms for different situations. One is applied in a registration process of a form model. It reliably extracts box regions from un-filled forms without using any model. The other is applied in a character extraction process. By using a spatial model of a form, it reliably extracts background regions, and re-integrates these regions if they are divided by characters written in the boxes. From these re-integrated regions, the exact locations of the character boxes are determined on the input image. Besides these algorithms, we present a form identification method that uses coarse background images. We implemented the algorithms into a prototype system that processes pre-printed forms. 50 types of existing forms were tested without any customization. Model registration, character extraction, and form identification were reliably carried out
Keywords :
document image processing; feature extraction; image registration; image segmentation; optical character recognition; OCR technologies; background region analysis; background region extraction; character extraction; coarse background images; form identification method; form processing; line-property-free form processing; multilayer analysis algorithms; re-integrated regions; registration process; Algorithm design and analysis; Data mining; Humans; Image analysis; Laboratories; Optical character recognition software; Prototypes; System testing; Telegraphy; Telephony;
Conference_Titel :
Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
Conference_Location :
Ulm
Print_ISBN :
0-8186-7898-4
DOI :
10.1109/ICDAR.1997.619834