DocumentCode :
835985
Title :
Joint Morphological-Lexical Language Modeling for Processing Morphologically Rich Languages With Application to Dialectal Arabic
Author :
Sarikaya, Ruhi ; Afify, Mohamed ; Deng, Yonggang ; Erdogan, Hakan ; Gao, Yuqing
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY
Volume :
16
Issue :
7
fYear :
2008
Firstpage :
1330
Lastpage :
1339
Abstract :
Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve.
Keywords :
language translation; speech recognition; statistical analysis; Arabic morphology; dialectal Arabic; joint morphological-lexical language modeling; language model parameter estimation; machine translation; morphological segments; morphologically rich languages; out-of-vocabulary rate; rich morphology; smooth probability estimation; speech recognition; trigram language models; Entropy; Information technology; Morphology; Natural language processing; Natural languages; Parameter estimation; Predictive models; Robustness; Speech recognition; Vocabulary; Joint modeling; language modeling; maximum entropy modeling; morphological analysis;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2008.924591
Filename :
4599398
Link To Document :
بازگشت