DocumentCode
3238844
Title
Stemming techniques for Arabic words: A comparative study
Author
Al-Nashashibi, May Y. ; Neagu, D. ; Yaghi, Ali A.
Author_Institution
Dept. of Comput., Univ. of Bradford, Bradford, UK
fYear
2010
fDate
2-4 Nov. 2010
Firstpage
270
Lastpage
276
Abstract
Text interpretation depends among other things on a pre-processing stage in extracting effectively a correct stem or root. Since there is no available standard stemmer for Arabic, we address here five methods for extracting Arabic roots and the outcomes of the approach with best results will be used later on. Four of these methods are based on a positional-letter-ranking approach where such an approach is investigated along with an adjustment, and two proposed variants. The fifth one is a rule-based approach. An algorithm for correcting irregular words is applied for all methods and a comparison is made between all approaches. The accuracy of these methods was found by comparing extracted roots with a predefined list of roots using an in-house text collection. Results show that the correction algorithm improved the accuracy of the rule-based one by about 14% and the positional letter ranking based algorithms by 7% to 10%. The adjusted positional letter ranking method proved to be the highest in accuracy among all five algorithms but slightly higher than the rule-based one. However, the rule-based algorithm was found to be the approach with the highest accuracy among all ten algorithms when the correction algorithm was included in it.
Keywords
knowledge based systems; natural language processing; text analysis; word processing; Arabic root extraction; Arabic word; correction algorithm; positional letter ranking approach; rule based approach; stemming technique; text interpretation; text preprocessing; Art; Data preprocessing; Arabic Root Extraction; Natural Language Processing; Positional Letter Ranking; Rule-Based; Text Mining; Variance; t-test;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Technology and Development (ICCTD), 2010 2nd International Conference on
Conference_Location
Cairo
Print_ISBN
978-1-4244-8844-5
Electronic_ISBN
978-1-4244-8845-2
Type
conf
DOI
10.1109/ICCTD.2010.5645873
Filename
5645873
Link To Document