• Title of article

    CLASSICAL ARABIC POETRY CATEGORIZATION USING N-GRAM FREQUENCY STATISTICS

  • Author/Authors

    Mohammad, Iqbal AbdulBaki Foundation of Technical Education - College of Technical Medical Health, Iraq

  • Pages
    7
  • From page
    159
  • To page
    165
  • Abstract
    Most of the Arabic language vocabulary is built from the roots derivation. These roots are words composed of three to five consonants letters. Any performance in Arabic language for the purpose of information retrieval needs to deal with the language morphological and structural changes first (which is called the stemming process) then a statistical method for extracting information is implemented. This approach presents a method for categorizing the Classical Arabic Poetry (CAP) into its categorizations: Ghazal, Medeh, Wasef, Hijaa ,..etc. by combining the algorithm of a light stemmer (which identify sets of prefixes and suffixes in an Arabic word in order to reach to the word root after removing the suffixes and prefixes) with N-gram statistical method (which retrieves the information independently of the language complexity). Two measures will be implemented: the Manhattan distance dissimilarity coefficient and the Dice s measure similarity coefficient for the purpose of categorization
  • Keywords
    CLASSICAL ARABIC POETRY , CATEGORIZATION , N-GRAM FREQUENCY STATISTICS
  • Journal title
    Iraqi Journal Of Science
  • Serial Year
    2010
  • Journal title
    Iraqi Journal Of Science
  • Record number

    2682719