Title of article

Arabic Text Categorization

Author/Authors

Duwairi, Rehab Jordan University of Science and Technology - Department of Computer Information Systems, Jordan

From page

126

To page

132

Abstract

In this paper, we compare the performance of three classifiers for Arabic text categorization. In particular, the naive Bayes, k-nearest-neighbors (knn), and distance-based classifiers were used. Unclassified documents were preprocessed by removing punctuation marks and stopwords. Each document is then represented as a vector of words (or of words and their frequencies as in the case of the naive Bayes classifier). Stemming was used to reduce the dimensionality offeature vectors of documents. The accuracy of the classifiers is compared using recall, precision, error rate and fallout. The results of the experimentations that were carried out on an in-house collected Arabic text show that the naive Bayes classifier outperforms the other two

Keywords

Text categorization , naive Bayes , knn , distance , based classifier , Arabic language

Journal title

The International Arab Journal of Information Technology (IAJIT)

Journal title

The International Arab Journal of Information Technology (IAJIT)

Record number

2543381

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=2543381