• DocumentCode
    778900
  • Title

    At the frontiers of OCR

  • Author

    Nagy, George

  • Author_Institution
    Rensselaer Polytech. Inst., Troy, NY, USA
  • Volume
    80
  • Issue
    7
  • fYear
    1992
  • fDate
    7/1/1992 12:00:00 AM
  • Firstpage
    1093
  • Lastpage
    1100
  • Abstract
    It is argued that it is time for a major change of approach to optical character recognition (OCR) research. The traditional approach, focusing on the correct classification of isolated characters, has been exhausted. The demonstration of the superiority of a new classification method under operational conditions requires large experimental facilities and databases beyond the resources of most researchers. In any case, even perfect classification of individual characters is insufficient for the conversion of complex archival documents to a useful computer-readable form. Many practical OCR tasks require integrated treatment of entire documents and well-organized typographic and domain-specific knowledge. New OCR systems should take advantage of the typographic uniformity of paragraphs or other layout components. They should also exploit the unavoidable interaction with human operators to improve themselves without explicit `training´
  • Keywords
    document image processing; optical character recognition; archival documents; classification method; domain-specific knowledge; layout components; optical character recognition; typographic uniformity; Character recognition; Classification algorithms; Computer graphics; Feature extraction; Humans; Layout; Logic; Optical arrays; Optical character recognition software; Text analysis;
  • fLanguage
    English
  • Journal_Title
    Proceedings of the IEEE
  • Publisher
    ieee
  • ISSN
    0018-9219
  • Type

    jour

  • DOI
    10.1109/5.156472
  • Filename
    156472