• DocumentCode
    2057798
  • Title

    Enhanced Algorithm for Extracting the Root of Arabic Words

  • Author

    Ghwanmeh, Sameh ; Kanaan, Ghassan ; Al-Shalabi, Riyad ; Rabab´ah, Saif

  • Author_Institution
    Yarmouk Univ., Irbid, Jordan
  • fYear
    2009
  • fDate
    11-14 Aug. 2009
  • Firstpage
    388
  • Lastpage
    391
  • Abstract
    Stemming is one of many tools used in information retrieval to combat the vocabulary mismatch problem, in which query words do not match document words. Stemming in the Arabic language does not fit into the usual mold, because stemming in most research in other languages so far depends only on eliminating prefixes and suffixes from the word, but Arabic words contain infixes as well. In this paper we have introduced an enhanced root-based algorithm that handles the problems of affixes, including prefixes, suffixes, and infixes depending on the morphological pattern of the word. The stemming concept has been used to eliminate all kinds of affixes, including infixes. Series of simulation experiments have been conducted to test the performance of the proposed algorithm. The results obtained showed that the algorithm extracts the correct roots with an accuracy rate up to 95%.
  • Keywords
    information retrieval; natural language processing; text analysis; Arabic language; enhanced root-based algorithm; information retrieval; morphological pattern; natural language processing; stemming concept; text analysis; vocabulary mismatch problem; Banking; Computer graphics; Data mining; Image retrieval; Information retrieval; Speech; Surface morphology; Testing; Visualization; Vocabulary; Affix; Infix; Information Retrieval; Prefix; Root; Stem; Stopword; Suffix;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Graphics, Imaging and Visualization, 2009. CGIV '09. Sixth International Conference on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-0-7695-3789-4
  • Type

    conf

  • DOI
    10.1109/CGIV.2009.10
  • Filename
    5298790