DocumentCode
3299358
Title
Comparison of rule based classification techniques for the Arabic textual data
Author
Thabtah, Fadi ; Gharaibeh, Omar ; Abdeljaber, Hussein
Author_Institution
MIS Dept., Philadelphia Univ., Amman, Jordan
fYear
2011
fDate
Nov. 29 2011-Dec. 1 2011
Firstpage
105
Lastpage
111
Abstract
Text categorisation discipline has recently attracted many scholars because of the large number of documents on the World Wide Web (WWW) that contain hidden useful information which can be utilised by organisational´s managers for decision making. However, the majority of research conducted in text categorisation is related to English data collections while there is limited research attempts conducted on mining corpuses in Arabic. This paper investigates the problem of Arabic text categorisation in order to measure the performance of different rule based classification data mining techniques. Precisely, four different rule based classification approaches: C4.5, RIPPER, PART, and OneRule are compared against the known CCA Arabic text data set. Experiments are carried out using a modified version of WEKA business intelligence tool, and the results determine that the least suitable classification algorithms for classifying Arabic texts is OneRule whereas RIPPER, C4.5 and PART have similar performance with respect to error rate.
Keywords
classification; data mining; natural language processing; text analysis; Arabic text categorisation; Arabic textual data; C4.5; CCA Arabic text data set; English data collection; OneRule; PART; RIPPER; WEKA business intelligence tool; World Wide Web; classification data mining; rule based classification; Artificial neural networks; Classification algorithms; Decision trees; Error analysis; Text categorization; Text mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Innovation in Information & Communication Technology (ISIICT), 2011 Fourth International Symposium on
Conference_Location
Amman
Print_ISBN
978-1-61284-672-9
Type
conf
DOI
10.1109/ISIICT.2011.6149604
Filename
6149604
Link To Document