DocumentCode :
2064400
Title :
Improving Arabic document categorization: Introducing local stem
Author :
Al-Shammari, Eiman Tamah
Author_Institution :
Kuwait Univ., Safat, Kuwait
fYear :
2010
fDate :
Nov. 29 2010-Dec. 1 2010
Firstpage :
385
Lastpage :
390
Abstract :
Stemming is a fundamental step in processing textual data preceding the tasks of text mining, Information Retrieval (IR), and natural language processing (NLP). The common goal of stemming is to standardize words by reducing a word to its base (root or stem), thus can be also considered a feature reduction technique. This paper aims at presenting a new dictionary free, content-based Arabic stemmer and adopts it as a feature reduction (selection) mechanism to study its contribution in improving Arabic text categorization. We employed three stemming mechanisms (root-based, light, and our stemming technique and assessed their performance in text classification exercises for an Arabic corpus to compare and contrast the text mining effectiveness of these Arabic stemming algorithms. The experiments were conducted on a corpus consisting of 2,966 Arabic documents that fall into three categories: cultural, social, and general. The experiment results showed that our stemmer significantly improved text classification accuracy.
Keywords :
data mining; pattern classification; text analysis; Arabic document categorization; Arabic stemming algorithms; Arabic text categorization; content-based Arabic stemmer; dictionary free Arabic stemmer; feature reduction technique; light stemming mechanism; root-based stemming mechanism; text classification; text mining; textual data processing; Classification; Stemming; Text Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-8134-7
Type :
conf
DOI :
10.1109/ISDA.2010.5687235
Filename :
5687235
Link To Document :
بازگشت