DocumentCode
3102102
Title
A new stemming algorithm to extract quadri-literal Arabic roots
Author
Kanaan, Ghassan ; Al-Shalabi, Riyad ; Jaam, Jihad M. ; Al-Kabi, Mohammed Naji ; Hasnah, Ahmad
Author_Institution
Comput. Inf. Syst. Dept., Yarmouk Univ., Irbid, Jordan
fYear
2004
fDate
19-23 April 2004
Firstpage
543
Abstract
Summary form only given. We present a new stemming algorithm to extract quadri-literal Arabic roots. The algorithm starts by excluding the prefixes and checks then the word characters starting from the last letter backward to the first one. A temporary matrix is used to store the suffix letters of the Arabic word, and another matrix is used to store the roots. The partition process is preceded by removing the particle from the source word. Checking the letters of any word includes checking whether the tested letter is included within the general standard Arabic word; if the test is positive then the letter will be stored in the temporary matrix, otherwise it will be stored in the root matrix. Mutation of some of the original letters in the word to be derived is used in some cases in order to store the substitute letters in the root matrix. Finally, the letters in the root matrix are arranged according to their order in the original word. The algorithm has been tested on a sample of 200 words generated randomly and descendant from quadri-literal Arabic verbs. It has shown a high performance reached 95% of accuracy rate.
Keywords
natural languages; text analysis; word processing; Arabic word; quadri-literal Arabic roots; root matrix; stemming algorithm; word characters extraction; Computer science; Data mining; Genetic mutations; Information systems; Matrices; Partitioning algorithms; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Information and Communication Technologies: From Theory to Applications, 2004. Proceedings. 2004 International Conference on
Print_ISBN
0-7803-8482-2
Type
conf
DOI
10.1109/ICTTA.2004.1307872
Filename
1307872
Link To Document