Title :
A new method to construct a statistical model for Arabic language
Author :
Sadiqui, Ali ; Zinedine, Ahmed
Author_Institution :
Fac. of Sci. Dhar El Mahrez, Sidi Mohamed Ben Abdellah Univ., Atlas, Morocco
Abstract :
Language models are one of the key components in modern systems of automatic language processing. In this study we present a new approach for the realization of a statistical model of Arabic language for non-vocalized texts. This approach allows to overcome the morphological complexity of the Arabic language and to address the limitations of existing morphological analyzers. Indeed the classic approach adopted by most of the morphological analyzers, bring the word out of its context and therefore generate several options for segmentation. Our solution proposes using trellises at a time to keep the possibilities of segmentation generated by the morphological analyzer and then create the model language. In order to realize this solution, we have used these tools: AraMorph and Lattice-Tool from the box SRILM and AT & WSF. The language was estimated from a corpus composed of 100 K words and has been tested on a corpus of 7 K words. The results and analysis are presented in this document.
Keywords :
computational linguistics; natural language processing; statistical analysis; text analysis; Arabic language processing; language model; morphological analyzer; nonvocalized text; statistical model; Analytical models; Complexity theory; Context; Decision support systems; Arabic Laguage Model; Automatic Arabic Language processing; Non-vocalized text; Statistical Model;
Conference_Titel :
Information Science and Technology (CIST), 2014 Third IEEE International Colloquium in
Conference_Location :
Tetouan
Print_ISBN :
978-1-4799-5978-5
DOI :
10.1109/CIST.2014.7016635