• DocumentCode
    2597482
  • Title

    Pattern-based algorithm for Part-of-Speech tagging Arabic text

  • Author

    Alqrainy, Shihadeh ; AlSerhan, Hasan Muaidi ; Ayesh, Aladdin

  • Author_Institution
    Prince Abdullah Bin Ghazi Fac. of Sci. & Inf. Technol., AlBalqa Appl. Univ., Amman
  • fYear
    2008
  • fDate
    25-27 Nov. 2008
  • Firstpage
    119
  • Lastpage
    124
  • Abstract
    Building a generic Part-of-Speech (POS) tagger system without a lexicon (dictionary) depends on the language and the characteristics of its grammar, both the morphological and the syntactical systems of that language. Arabic language has a valuable and important feature, called diacritics, which are marks placed over and below the letters of Arabic word. This paper presents a novel algorithm to assign the correct POS tag to those words belonging to a verb or a noun class in an Arabic text. The algorithm is based on the pattern (wazn) of the word instead of using a huge manually tagged lexicon from which large amounts of training data can be extracted. An experiment was ran on a data set that contains 5,000 words belonging to a noun and a verb class to evaluate the accuracy of the algorithm. The algorithm is achieved an accuracy of 91%.
  • Keywords
    natural language processing; text analysis; data set; diacritics; noun class; part-of-speech tagging Arabic text; pattern-based algorithm; Data mining; Dictionaries; Information technology; Labeling; Radio access networks; Speech recognition; Speech synthesis; Tagging; Testing; Training data; Arabic Language; Diacritics; Morphological; Part-Of-Speech(POS); Syntactical; Tag set;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Engineering & Systems, 2008. ICCES 2008. International Conference on
  • Conference_Location
    Cairo
  • Print_ISBN
    978-1-4244-2115-2
  • Electronic_ISBN
    978-1-4244-2116-9
  • Type

    conf

  • DOI
    10.1109/ICCES.2008.4772979
  • Filename
    4772979