Title :
Open Vocabulary Arabic Handwriting Recognition Using Morphological Decomposition
Author :
Hamdani, Mahdi ; Mousa, Amr El-Desoky ; Ney, Hermann
Author_Institution :
Human Language Technol. & Pattern Recognition Group, RWTH Aachen Univ., Aachen, Germany
Abstract :
The use of Language Models (LMs) is a very important component in large and open vocabulary recognition systems. This paper presents an open-vocabulary approach for Arabic handwriting recognition. The proposed approach makes use of Arabic word decomposition based on morphological analysis. The vocabulary is a combination of words and sub-words obtained by the decomposition process. Out Of Vocabulary (OOV) words can be recognized by combining different elements from the lexicon. The recognition system is based on Hidden Markov Models (HMMs) with position and context dependent character models. An n-gram LM trained on the decomposed text is used along with the HMMs during the search. The approach is evaluated using two Arabic handwriting datasets. The open vocabulary approach leads to a significant improvement in the system performance. Two different types experiments for two Arabic handwriting recognition tasks are conducted in this work. The proposed approach for open vocabulary allows to have an absolute improvement of up to 1% in the Word Error Rate (WER) for the constrained task and to keep the same performance of the baseline system for the unconstrained one.
Keywords :
handwriting recognition; hidden Markov models; natural language processing; vocabulary; Arabic handwriting datasets; Arabic word decomposition; HMM; OOV words; WER; context dependent character model; hidden Markov model; language model; morphological analysis; morphological decomposition; n-gram LM training; open vocabulary Arabic handwriting recognition system; out of vocabulary words; position dependent character model; system performance improvement; text decomposition; word error rate; Character recognition; Context; Feature extraction; Handwriting recognition; Hidden Markov models; Training; Vocabulary; Arabic handwriting recognition; Hidden Markov Models; Linguistic analysis;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.63