DocumentCode
2597482
Title
Pattern-based algorithm for Part-of-Speech tagging Arabic text
Author
Alqrainy, Shihadeh ; AlSerhan, Hasan Muaidi ; Ayesh, Aladdin
Author_Institution
Prince Abdullah Bin Ghazi Fac. of Sci. & Inf. Technol., AlBalqa Appl. Univ., Amman
fYear
2008
fDate
25-27 Nov. 2008
Firstpage
119
Lastpage
124
Abstract
Building a generic Part-of-Speech (POS) tagger system without a lexicon (dictionary) depends on the language and the characteristics of its grammar, both the morphological and the syntactical systems of that language. Arabic language has a valuable and important feature, called diacritics, which are marks placed over and below the letters of Arabic word. This paper presents a novel algorithm to assign the correct POS tag to those words belonging to a verb or a noun class in an Arabic text. The algorithm is based on the pattern (wazn) of the word instead of using a huge manually tagged lexicon from which large amounts of training data can be extracted. An experiment was ran on a data set that contains 5,000 words belonging to a noun and a verb class to evaluate the accuracy of the algorithm. The algorithm is achieved an accuracy of 91%.
Keywords
natural language processing; text analysis; data set; diacritics; noun class; part-of-speech tagging Arabic text; pattern-based algorithm; Data mining; Dictionaries; Information technology; Labeling; Radio access networks; Speech recognition; Speech synthesis; Tagging; Testing; Training data; Arabic Language; Diacritics; Morphological; Part-Of-Speech(POS); Syntactical; Tag set;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Engineering & Systems, 2008. ICCES 2008. International Conference on
Conference_Location
Cairo
Print_ISBN
978-1-4244-2115-2
Electronic_ISBN
978-1-4244-2116-9
Type
conf
DOI
10.1109/ICCES.2008.4772979
Filename
4772979
Link To Document