• DocumentCode
    1246894
  • Title

    High accuracy optical character recognition using neural networks with centroid dithering

  • Author

    Avi-Itzhak, Hadar I. ; Diep, Thanh A. ; Garland, Harry

  • Author_Institution
    Dept. of Electr. Eng., Stanford Univ., CA, USA
  • Volume
    17
  • Issue
    2
  • fYear
    1995
  • fDate
    2/1/1995 12:00:00 AM
  • Firstpage
    218
  • Lastpage
    224
  • Abstract
    Optical character recognition (OCR) refers to a process whereby printed documents are transformed into ASCII files for the purpose of compact storage, editing, fast retrieval, and other file manipulations through the use of a computer. The recognition stage of an OCR process is made difficult by added noise, image distortion, and the various character typefaces, sizes, and fonts that a document may have. In this study a neural network approach is introduced to perform high accuracy recognition on multi-size and multi-font characters; a novel centroid-dithering training process with a low noise-sensitivity normalization procedure is used to achieve high accuracy results. The study consists of two parts. The first part focuses on single size and single font characters, and a two-layered neural network is trained to recognize the full set of 94 ASCII character images in 12-pt Courier font. The second part trades accuracy for additional font and size capability, and a larger two-layered neural network is trained to recognize the full set of 94 ASCII character images for all point sizes from 8 to 32 and for 12 commonly used fonts. The performance of these two networks is evaluated based on a database of more than one million character images from the testing data set
  • Keywords
    character sets; neural nets; optical character recognition; ASCII character images; centroid dithering; low noise-sensitivity normalization procedure; multi-size multi-font characters; optical character recognition; two-layered neural network; Character recognition; Image databases; Image recognition; Neural networks; Optical character recognition software; Optical computing; Optical distortion; Optical fiber networks; Optical noise; Testing;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/34.368165
  • Filename
    368165