Title of article :
The Effectiveness of Arabic Stemmers Using Arabized Word Removal
Author/Authors :
Al-shalabi, Hamood Faculty of Information Science & Technology - Universiti Kebangsaan Malaysia, Malaysia. Sana’a University, Sana’a, Yemen , Tiun, Sabrina Faculty of Information Science & Technology - Universiti Kebangsaan Malaysia, Selangor, Malaysia , Omar, Nazlia Faculty of Information Science & Technology - Universiti Kebangsaan Malaysia, Selangor, Malaysia , Alezabi, Kamal Ali Institute of Computer Science & Digital Innovation (ICSDI) - UCSI University, Kuala Lumpur, Malaysia , AL-Aswadi, Fatima N. Faculty of Computer Science and Engineering - Hodeidah University, Hodeidah, Yemen. Universiti Sains Malaysia, Pulau Pinang, Malaysia
Pages :
16
From page :
85
To page :
100
Abstract :
Other languages have influenced Arabic because of several factors, such as geographical nearness, trade communication, past Islamic conquests, science and technology, new devices, brand names, models, and fashion. As a result of these factors, foreign words are used in Arabic text and are known as Arabised words. Arabised words affect the Arabic natural language processing (NLP) task because identifying a correct stem or root from an Arabic word becomes more difficult. Therefore, a more efficient Arabic NLP can be developed if Arabised word removal is part of a pre-processing task. In this paper, we propose an algorithm for detecting and extracting Arabised words as a pre-processing task for an Arabic stemming task. This algorithm is a combination of lexicon-based and rule-based approaches. The lexicon list has been developed based on various sources of Arabic text resources, and the rule-based algorithm has been designed to cater to Arabised words with definite articles and use pattern matching on prefixes and suffixes. To evaluate the effectiveness of the proposed Arabised word removal algorithm on the Arabic NLP task, we use Arabised word removal as part of pre-processing in Arabic stemmers. Three Arabic stemmers are used in our evaluation, namely, light stemming, condition light and ARLS, on three types of Arabic standard datasets. Comparisons were made by measuring the performance of precision, recall and IFC on the stemmers with or without our Arabised word removal pre-processing. Results show that the performance on all the stemmers improves if Arabised word removal is included as part of the stemming's pre-processing. Therefore, an efficient Arabic NLP application or task can be developed if Arabised word removal is included in the pre-processing stage for Arabic NLP application, mainly Arabic stemming.
Keywords :
Arabised Word , Natural Language Processing , Arabised Words Removal , Arabic Text Pre-Processing , Arabic Stemming , Text Processing , Arabic Language
Journal title :
International Journal of Information Science and Management (IJISM)
Serial Year :
2022
Record number :
2730073
Link To Document :
بازگشت