DocumentCode :
2701047
Title :
Joint Morphological-Lexical Language Modeling (JMLLM) for Arabic
Author :
Sarikaya, R. ; Afify, M. ; Gao, Yuan
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
4
fYear :
2007
fDate :
15-20 April 2007
Abstract :
Language modeling for inflected languages such as Arabic poses new challenges for speech recognition due to rich morphology. The rich morphology results in large increases in perplexity and out-of-vocabulary (OOV) rate. In this study, we present a new language modeling method that takes advantage of Arabic morphology by combining morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items within a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates. Preliminary experiments detailed in this paper show satisfactory improvements over word and morpheme based trigram language models and their interpolations.
Keywords :
natural language processing; speech recognition; Arabic morphology; joint morphological-lexical language modeling; lexical items; morpheme based trigram language models; out-of-vocabulary rate; perplexity; speech recognition; Algorithm design and analysis; Entropy; Interpolation; Morphology; Natural languages; Parameter estimation; Probability; Robustness; Speech recognition; Vocabulary; Joint Modeling; Language Modeling; Maximum Entropy Modeling; Morphological Analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
ISSN :
1520-6149
Print_ISBN :
1-4244-0727-3
Type :
conf
DOI :
10.1109/ICASSP.2007.367193
Filename :
4218067
Link To Document :
بازگشت