• DocumentCode
    3238844
  • Title

    Stemming techniques for Arabic words: A comparative study

  • Author

    Al-Nashashibi, May Y. ; Neagu, D. ; Yaghi, Ali A.

  • Author_Institution
    Dept. of Comput., Univ. of Bradford, Bradford, UK
  • fYear
    2010
  • fDate
    2-4 Nov. 2010
  • Firstpage
    270
  • Lastpage
    276
  • Abstract
    Text interpretation depends among other things on a pre-processing stage in extracting effectively a correct stem or root. Since there is no available standard stemmer for Arabic, we address here five methods for extracting Arabic roots and the outcomes of the approach with best results will be used later on. Four of these methods are based on a positional-letter-ranking approach where such an approach is investigated along with an adjustment, and two proposed variants. The fifth one is a rule-based approach. An algorithm for correcting irregular words is applied for all methods and a comparison is made between all approaches. The accuracy of these methods was found by comparing extracted roots with a predefined list of roots using an in-house text collection. Results show that the correction algorithm improved the accuracy of the rule-based one by about 14% and the positional letter ranking based algorithms by 7% to 10%. The adjusted positional letter ranking method proved to be the highest in accuracy among all five algorithms but slightly higher than the rule-based one. However, the rule-based algorithm was found to be the approach with the highest accuracy among all ten algorithms when the correction algorithm was included in it.
  • Keywords
    knowledge based systems; natural language processing; text analysis; word processing; Arabic root extraction; Arabic word; correction algorithm; positional letter ranking approach; rule based approach; stemming technique; text interpretation; text preprocessing; Art; Data preprocessing; Arabic Root Extraction; Natural Language Processing; Positional Letter Ranking; Rule-Based; Text Mining; Variance; t-test;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Technology and Development (ICCTD), 2010 2nd International Conference on
  • Conference_Location
    Cairo
  • Print_ISBN
    978-1-4244-8844-5
  • Electronic_ISBN
    978-1-4244-8845-2
  • Type

    conf

  • DOI
    10.1109/ICCTD.2010.5645873
  • Filename
    5645873