• DocumentCode
    1459201
  • Title

    A generic system for form dropout

  • Author

    Yu, Bin ; Jain, Anil K.

  • Author_Institution
    Dept. of Comput. Sci., Michigan State Univ., East Lansing, MI, USA
  • Volume
    18
  • Issue
    11
  • fYear
    1996
  • fDate
    11/1/1996 12:00:00 AM
  • Firstpage
    1127
  • Lastpage
    1134
  • Abstract
    Recent advances in intelligent character recognition are enabling us to address many challenging problems in document image analysis. One of them is intelligent form analysis. This paper describes a generic system for form dropout when the filled-in characters or symbols are either touching or crossing the form frames. We propose a method to separate these characters from form frames whose locations are unknown. Since some of the character strokes are either touching or crossing the form frames, we need to address the following three issues: 1) localization of form frames; 2) separation of characters and form frames; and 3) reconstruction of broken strokes introduced during separation. The form frame is automatically located by finding long straight lines based on the block adjacency graph. Form frame separation and character reconstruction are implemented by means of this graph. The proposed system includes form structure learning and form dropout. First, a form structure-based template is automatically generated from a blank form which includes form frames, preprinted data areas and skew angle. With this form template, our system can then extract both handwritten and machine-typed filled-in data. Experimental results on three different types of forms show the performance of our system. Further, the proposed method is robust to noise and skew that is introduced during scanning
  • Keywords
    document image processing; image reconstruction; image segmentation; knowledge based systems; optical character recognition; block adjacency graph; broken stroke reconstruction; character reconstruction; document image analysis; form dropout; form structure learning; form structure-based template; intelligent character recognition; intelligent form analysis; preprinted data areas; skew angle; Character recognition; Credit cards; Data mining; Government; Image analysis; Image reconstruction; Image segmentation; Ink; Noise robustness; Text analysis;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/34.544084
  • Filename
    544084