• DocumentCode
    778931
  • Title

    Major components of a complete text reading system

  • Author

    Tsujimoto, Shuichi ; Asada, Haruo

  • Author_Institution
    Toshiba Corp., Kawasaki, Japan
  • Volume
    80
  • Issue
    7
  • fYear
    1992
  • fDate
    7/1/1992 12:00:00 AM
  • Firstpage
    1133
  • Lastpage
    1149
  • Abstract
    The document image processes used in a recently developed text reading system are described. The system consists of three major components: document analysis, document understanding, and character segmentation/recognition. The document analysis component extracts lines of text from a page for recognition. The document understanding component extracts logical relationships between the document constituents. The character segmentation/recognition component extracts characters from a text line and recognizes them. Experiments on more than a hundred documents have proved that the proposed approaches to document analysis and document understanding are robust even for multicolumned and multiarticle documents containing graphics and photographs, and that the proposed character segmentation/recognition method is robust enough to cope with omnifont characters which frequently touch each other
  • Keywords
    document image processing; optical character recognition; OCR; character segmentation; document analysis; document understanding; graphics; multiarticle documents; omnifont characters; photographs; text reading system; Character recognition; Document image processing; Graphics; Image segmentation; Marine vehicles; Robustness; Solid modeling; Text analysis; Text recognition; Tree graphs;
  • fLanguage
    English
  • Journal_Title
    Proceedings of the IEEE
  • Publisher
    ieee
  • ISSN
    0018-9219
  • Type

    jour

  • DOI
    10.1109/5.156475
  • Filename
    156475