Title :
Hybrid word/Part-of-Arabic-Word Language Models for arabic text document recognition
Author :
Mohamed Faouzi BenZeghiba;Jérôme Louradour;Christopher Kermorvant
Author_Institution :
A2iA S.A., 39 rue de la Bienfaisance, 75008 - Paris - France
Abstract :
This paper describes a simple approach to generate an efficient hybrid word/Part-of-Arabic-Word (PAW) Language Model (LM). More precisely, less frequent words in a full word vocabulary are decomposed into PAWs. The resulted PAWs are incorporated with the most frequent words to generate a hybrid word-PAW vocabulary which is used to estimate a hybrid flat n-gram statistical language model. For comparison purposes, language models with full PAW decomposition of the word vocabulary are generated. To assess the quality of the three types of LMs (i.e. full word, hybrid word/PAW and full PAW LMs), evaluation experiments are conducted under three different tasks using two benchmarking databases, namely Maurdor and Khatt. Results in terms of word error rate show that systems using the full PAW and the proposed hybrid LMs perform equally the same, and both of them, systematically, outperform systems using word LMs. However, systems using hybrid LMs require less memory than those using full PAW LMs.
Keywords :
"Handwriting recognition","Text recognition","Training","Adaptive optics","Optical imaging","Vocabulary","Hybrid power systems"
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
DOI :
10.1109/ICDAR.2015.7333846