• DocumentCode
    3299358
  • Title

    Comparison of rule based classification techniques for the Arabic textual data

  • Author

    Thabtah, Fadi ; Gharaibeh, Omar ; Abdeljaber, Hussein

  • Author_Institution
    MIS Dept., Philadelphia Univ., Amman, Jordan
  • fYear
    2011
  • fDate
    Nov. 29 2011-Dec. 1 2011
  • Firstpage
    105
  • Lastpage
    111
  • Abstract
    Text categorisation discipline has recently attracted many scholars because of the large number of documents on the World Wide Web (WWW) that contain hidden useful information which can be utilised by organisational´s managers for decision making. However, the majority of research conducted in text categorisation is related to English data collections while there is limited research attempts conducted on mining corpuses in Arabic. This paper investigates the problem of Arabic text categorisation in order to measure the performance of different rule based classification data mining techniques. Precisely, four different rule based classification approaches: C4.5, RIPPER, PART, and OneRule are compared against the known CCA Arabic text data set. Experiments are carried out using a modified version of WEKA business intelligence tool, and the results determine that the least suitable classification algorithms for classifying Arabic texts is OneRule whereas RIPPER, C4.5 and PART have similar performance with respect to error rate.
  • Keywords
    classification; data mining; natural language processing; text analysis; Arabic text categorisation; Arabic textual data; C4.5; CCA Arabic text data set; English data collection; OneRule; PART; RIPPER; WEKA business intelligence tool; World Wide Web; classification data mining; rule based classification; Artificial neural networks; Classification algorithms; Decision trees; Error analysis; Text categorization; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovation in Information & Communication Technology (ISIICT), 2011 Fourth International Symposium on
  • Conference_Location
    Amman
  • Print_ISBN
    978-1-61284-672-9
  • Type

    conf

  • DOI
    10.1109/ISIICT.2011.6149604
  • Filename
    6149604