• DocumentCode
    675534
  • Title

    Towards improving Khoja rule-based Arabic stemmer

  • Author

    Al-Kabi, Mohammed N.

  • Author_Institution
    Fac. of Sci., IT Zarqa Univ., Zarqa, Jordan
  • fYear
    2013
  • fDate
    3-5 Dec. 2013
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Stemming algorithms are used to remove irrelevant morphological variations from different words, and extract the stem or the root from which the inputted word is derived. Stemming can then help to standardize terms referring to the same concept. These algorithms are widely used in information retrieval systems and Web search engines, in addition to other systems such as: Machine translation, text clustering, text summarization, question answering, indexing, text mining, text classification... etc. Khoja stemmer is a standard Arabic stemmer, which has a number of flaws. Previous studies and this one show that Khoja stemmer is better than other two competitive ones evaluated in this study. The Khoja stemmer and the other two evaluated Arabic stemmers depend mainly in their work on (Patterns, Forms). Therefore the identification of the flaws leads to identification of missing Patterns not used by Khoja stemmer. So the enhancement to Khoja stemmer is restricted to adding missing patterns, and this leads to around 5% improvement to the accuracy of Khoja stemmer.
  • Keywords
    information retrieval; natural language processing; Khoja rule-based Arabic stemmer; Web search engines; flaws identification; information retrieval systems; missing pattern identification; stemming algorithms; Accuracy; Algorithm design and analysis; Computers; Conferences; Electrical engineering; Fault diagnosis; Standards; Arabic; Information Retrieval; Root-Based Stemming; Stemming; Tokenization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Applied Electrical Engineering and Computing Technologies (AEECT), 2013 IEEE Jordan Conference on
  • Conference_Location
    Amman
  • Print_ISBN
    978-1-4799-2305-2
  • Type

    conf

  • DOI
    10.1109/AEECT.2013.6716437
  • Filename
    6716437