• DocumentCode
    778877
  • Title

    Omnidocument technologies

  • Author

    Bokser, Mindy

  • Author_Institution
    Calera Recognition Syst. Inc., Sunnyvale, CA, USA
  • Volume
    80
  • Issue
    7
  • fYear
    1992
  • fDate
    7/1/1992 12:00:00 AM
  • Firstpage
    1066
  • Lastpage
    1078
  • Abstract
    An optical character recognition (OCR) engine that is omnifont and reasonably robust on individual degraded characters is presented. The weakest link is its handling of characters which are difficult to segment. The engine is divided into four phases: segmentation, image recognition, ambiguity resolution, and document analysis. The features are zonal and reduce the image to a blurred, gray-level representation. The classifier is data-driven, trained offline, and model-free. Handcrafted features and decision trees tend to be brittle in the presence of noise. To satisfy the needs of full-text applications, the system captures the structure of the document so that, when viewed in a word processor or spreadsheet program, the formatting of the optically recognized document reflects that of the original document. To satisfy the needs of the forms market, a proofing and correction tool displays `pop-up´ images of uncertain characters
  • Keywords
    document image processing; optical character recognition; ambiguity resolution; correction tool; degraded characters; document analysis; formatting; full-text applications; gray-level representation; image recognition; optical character recognition; proofing; segmentation; Character recognition; Degradation; Engines; Image recognition; Image resolution; Image segmentation; Optical character recognition software; Optical noise; Robustness; Text analysis;
  • fLanguage
    English
  • Journal_Title
    Proceedings of the IEEE
  • Publisher
    ieee
  • ISSN
    0018-9219
  • Type

    jour

  • DOI
    10.1109/5.156470
  • Filename
    156470