• DocumentCode
    2261591
  • Title

    Multi-font recognition of printed Arabic using the BBN BYBLOS speech recognition system

  • Author

    LaPre, Christopher ; Zhao, Ying ; Raphael, Christopher ; Schwartz, Richard ; Makhoul, John

  • Author_Institution
    BBN Syst. & Technol. Corp., Cambridge, MA, USA
  • Volume
    4
  • fYear
    1996
  • fDate
    7-10 May 1996
  • Firstpage
    2136
  • Abstract
    We use a hidden Markov model (HMM) based continuous speech recognition system to perform off-line character recognition (OCR) of Arabic printed text. The HMM trainer and recognizer are used without change, however we modify the feature extraction stage to compute features relevant to OCR. Although we begin by segmenting the page into a collection of lines, no further segmentation is necessary for either recognition or training. Experiments on the ARPA Arabic data corpus yield a range of character error rates from under one percent for a single computer font to 2.8% for multiple-font recognition of a wide range of material from books, magazines and newspapers
  • Keywords
    feature extraction; hidden Markov models; image segmentation; optical character recognition; speech recognition; ARPA Arabic data corpus; BBN BYBLOS speech recognition system; HMM; HMM recognizer; HMM trainer; books; character error rates; continuous speech recognition system; experiments; feature extraction; hidden Markov model; magazines; multifont recognition; newspapers; off-line character recognition; page segmentation; printed Arabic; single computer font; training; Character recognition; Error analysis; Feature extraction; Handwriting recognition; Hidden Markov models; Histograms; Optical character recognition software; Optical materials; Speech recognition; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-3192-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.1996.545738
  • Filename
    545738