Title :
Evaluation of the ambiguity caused by the absence of diacritical marks in Arabic texts: Statistical study
Author :
Mohamed Boudchiche;Azzeddine Mazroui
Author_Institution :
Department of Mathematics and Computer Science, Faculty of Sciences, University Mohamed first, B-P 717, 60000, Oujda, Morocco
Abstract :
This work falls within the framework of the Natural Language Processing. Its objective is to assess the level of ambiguity caused by the absence of diacritical marks in Arabic texts during the information extraction process. We have carried out a statistical study based on four indicators: the root, the lemma, the stem and the POS tag of the word. For this, we used a large vowelized corpus containing more than 80 million words collected from several sources. The conducted study showed that the absence of diacritical marks in Arabic texts represents the main cause of the ambiguity observed in the information extraction process. Thus, based on this study we can conclude that the use of a vowelized corpus reduces considerably the ambiguity.
Keywords :
"Natural language processing","Semantics","Context","Computer science","Syntactics","Search engines"
Conference_Titel :
Information & Communication Technology and Accessibility (ICTA), 2015 5th International Conference on
DOI :
10.1109/ICTA.2015.7426904