DocumentCode
2261591
Title
Multi-font recognition of printed Arabic using the BBN BYBLOS speech recognition system
Author
LaPre, Christopher ; Zhao, Ying ; Raphael, Christopher ; Schwartz, Richard ; Makhoul, John
Author_Institution
BBN Syst. & Technol. Corp., Cambridge, MA, USA
Volume
4
fYear
1996
fDate
7-10 May 1996
Firstpage
2136
Abstract
We use a hidden Markov model (HMM) based continuous speech recognition system to perform off-line character recognition (OCR) of Arabic printed text. The HMM trainer and recognizer are used without change, however we modify the feature extraction stage to compute features relevant to OCR. Although we begin by segmenting the page into a collection of lines, no further segmentation is necessary for either recognition or training. Experiments on the ARPA Arabic data corpus yield a range of character error rates from under one percent for a single computer font to 2.8% for multiple-font recognition of a wide range of material from books, magazines and newspapers
Keywords
feature extraction; hidden Markov models; image segmentation; optical character recognition; speech recognition; ARPA Arabic data corpus; BBN BYBLOS speech recognition system; HMM; HMM recognizer; HMM trainer; books; character error rates; continuous speech recognition system; experiments; feature extraction; hidden Markov model; magazines; multifont recognition; newspapers; off-line character recognition; page segmentation; printed Arabic; single computer font; training; Character recognition; Error analysis; Feature extraction; Handwriting recognition; Hidden Markov models; Histograms; Optical character recognition software; Optical materials; Speech recognition; Text recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location
Atlanta, GA
ISSN
1520-6149
Print_ISBN
0-7803-3192-3
Type
conf
DOI
10.1109/ICASSP.1996.545738
Filename
545738
Link To Document