• DocumentCode
    2397273
  • Title

    Language-independent OCR using a continuous speech recognition system

  • Author

    Schwartz, Richard ; LaPre, Christopher ; Makhoul, John ; Raphael, Christopher ; Zhao, Ying

  • Author_Institution
    BBN Syst. & Technol. Corp., Cambridge, MA, USA
  • Volume
    3
  • fYear
    1996
  • fDate
    25-29 Aug 1996
  • Firstpage
    99
  • Abstract
    In this paper we show how continuous speech recognition methods can be used for character recognition resulting in a technology that is language independent and does not require presegmentation of the data at the character and word levels. In multifont experiments on the ARPA Arabic OCR Corpus an average character error rate of 1.9% is obtained using the BBN BYBLOS continuous speech recognition system with no modifications. A first experiment using the identical system and procedures, trained and tested on a subset of the English Univ. of Washington OCR corpus resulted in 1.4% character error
  • Keywords
    hidden Markov models; optical character recognition; speech recognition; ARPA Arabic OCR Corpus; BBN BYBLOS continuous speech recognition system; continuous speech recognition system; language-independent OCR; multifont experiments; Character recognition; Error analysis; Error correction; Hidden Markov models; Image segmentation; Natural languages; Optical character recognition software; Pattern recognition; Speech recognition; System testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 1996., Proceedings of the 13th International Conference on
  • Conference_Location
    Vienna
  • ISSN
    1051-4651
  • Print_ISBN
    0-8186-7282-X
  • Type

    conf

  • DOI
    10.1109/ICPR.1996.546802
  • Filename
    546802