Title :
Stop-word removal algorithm for Arabic language
Author :
Al-Shalabi, Riyad ; Kanaan, Ghassan ; Jaam, Pr Jihad M ; Hasnah, Ahmad ; Hilat, Eyad
Author_Institution :
Dept. of Comput. Sci., Yarmouk Univ., Irbid, Jordan
Abstract :
Summary form only given. We have designed and implemented an efficient stop-word removal algorithm for Arabic language based on a finite state machine (FSM). An efficient stop-word removal technique is needed in many natural language processing application such as: spelling normalization, stemming and stem weighting, Question answering systems and in information retrieval systems (IR). Most of the existing stop-word removal techniques bases on a dictionary that contains a list of stop-word, it is very expensive, it takes too much time for searching process and required too much space to store these stop-words. The new Arabic removal stop-word technique has been tested using a set of 242 Arabic abstracts chosen from the Proceedings of the Saudi Arabian National Computer conferences, and another set of data chosen from the holy Q´uran, and it gives impressive results that reached approximately to 98%.
Keywords :
finite state machines; natural languages; optimisation; text analysis; Arabic language; dictionary; finite state machine; information retrieval system; information searching process; natural language processing; stop-word removal algorithm; Abstracts; Algorithm design and analysis; Automata; Computational Intelligence Society; Computer science; Dictionaries; Information retrieval; Natural language processing; Testing;
Conference_Titel :
Information and Communication Technologies: From Theory to Applications, 2004. Proceedings. 2004 International Conference on
Print_ISBN :
0-7803-8482-2
DOI :
10.1109/ICTTA.2004.1307875