• DocumentCode
    3020552
  • Title

    Performance improvements to the BBN Byblos OCR system

  • Author

    Decerbo, Michael ; Natarajan, Premkumar ; Prasad, Rohit ; Macrostie, Ehry ; Arun, Ravindran

  • Author_Institution
    BBN Technol., Cambridge, MA, USA
  • fYear
    2005
  • fDate
    29 Aug.-1 Sept. 2005
  • Firstpage
    411
  • Abstract
    In this paper, we describe four recent enhancements to the BBN Byblos OCR system, a multilingual HMM-based character recognition system which has been demonstrated on a variety of languages, including English, Arabic, Chinese, and Japanese. These enhancements are implemented as optional extensions to the system and provide improved performance for certain scripts or domains. Projection-based re-estimation of line boundaries reduces instability in the presence of some types of noise. An alternate modeling strategy used in the first of two recognition search passes substantially increases speed on languages with a large number of characters. Another speed improvement comes from automatic discovery and modeling of sub-characters. The use of heteroschedastic linear discriminant analysis (HLDA) makes modeling more tractable by reducing feature-space dimensionality.
  • Keywords
    hidden Markov models; natural languages; optical character recognition; BBN Byblos OCR system; automatic discovery; heteroschedastic linear discriminant analysis; hidden Markov model; multilingual HMM; optical character recognition system; performance improvement; projection reestimation; Character recognition; Feature extraction; Gaussian processes; Hidden Markov models; Natural languages; Noise reduction; Noise robustness; Optical character recognition software; Probability; Topology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
  • ISSN
    1520-5263
  • Print_ISBN
    0-7695-2420-6
  • Type

    conf

  • DOI
    10.1109/ICDAR.2005.189
  • Filename
    1575579