• DocumentCode
    3140925
  • Title

    Statistical approach to feature extraction for numeral recognition from degraded documents

  • Author

    Vaidya, Vinay ; Deshpande, Vikram ; Gada, Dhiral ; Shirole, Bhagyashree

  • Author_Institution
    Sci. Appl. Center, Siemens Inf. Syst. Ltd., Pune, India
  • fYear
    1999
  • fDate
    20-22 Sep 1999
  • Firstpage
    273
  • Lastpage
    276
  • Abstract
    Proposes a fast statistical method for numeral recognition from degraded documents. Our method uses a feature-based approach in combination with weights assigned to each feature along with a factor that defines how much of the feature is detectable. We describe an object by how it looks or does not look by giving positive or negative weights for those features. In the proposed method, the positive and negative weights together describe a numeral completely, and help to clearly distinguish one numeral from another. The recognition rate obtained by this method is much higher than that obtained by a simple feature-based method. We present results of 160 bills printed on a dot matrix printer. There are many variations in the printed output, such as faded bills or highly smudged outputs. With this level of distortion, the numerals are sometimes ambiguous to human eyes. However our method can recognize the distorted numerals with a success rate of 97%. The algorithm takes about 0.01 seconds for the recognition of a single numeral on a Pentium 200 MHz machine
  • Keywords
    computer stationery; document image processing; feature extraction; invoicing; matrix printers; optical character recognition; printing; statistics; 0.01 s; 200 MHz; Pentium-based machine; ambiguous numerals; degraded documents; distorted numerals; dot matrix printer; faded bills; fast statistical method; feature detectability; feature extraction; numeral recognition; printed bills; printed output variations; recognition rate; smudged output; weights; Computer vision; Degradation; Feature extraction; Humans; Information systems; Optical character recognition software; Packaging machines; Printers; Statistical analysis; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    0-7695-0318-7
  • Type

    conf

  • DOI
    10.1109/ICDAR.1999.791777
  • Filename
    791777