Title :
Multi-font recognition of printed Arabic using the BBN BYBLOS speech recognition system
Author :
LaPre, Christopher ; Zhao, Ying ; Raphael, Christopher ; Schwartz, Richard ; Makhoul, John
Author_Institution :
BBN Syst. & Technol. Corp., Cambridge, MA, USA
Abstract :
We use a hidden Markov model (HMM) based continuous speech recognition system to perform off-line character recognition (OCR) of Arabic printed text. The HMM trainer and recognizer are used without change, however we modify the feature extraction stage to compute features relevant to OCR. Although we begin by segmenting the page into a collection of lines, no further segmentation is necessary for either recognition or training. Experiments on the ARPA Arabic data corpus yield a range of character error rates from under one percent for a single computer font to 2.8% for multiple-font recognition of a wide range of material from books, magazines and newspapers
Keywords :
feature extraction; hidden Markov models; image segmentation; optical character recognition; speech recognition; ARPA Arabic data corpus; BBN BYBLOS speech recognition system; HMM; HMM recognizer; HMM trainer; books; character error rates; continuous speech recognition system; experiments; feature extraction; hidden Markov model; magazines; multifont recognition; newspapers; off-line character recognition; page segmentation; printed Arabic; single computer font; training; Character recognition; Error analysis; Feature extraction; Handwriting recognition; Hidden Markov models; Histograms; Optical character recognition software; Optical materials; Speech recognition; Text recognition;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
0-7803-3192-3
DOI :
10.1109/ICASSP.1996.545738