Title of article :
A Hierarchical K-NN Classifier for Textual Data
Author/Authors :
Duwairi, Rehab Jordan University of Science and Technology, Jordan , Al-Zubaidi, Rania Jordan University of Science and Technology, Jordan
From page :
251
To page :
259
Abstract :
This paper presents a classifier that is based on a modified version of the well known K-Nearest Neighbors classifier (K-NN). ‎The original K-NN classifier was adjusted to work with category representatives rather than training documents. Each ‎category was represented by one document that was constructed by consulting all of its training documents and then applying ‎feature selection so that only important terms remain. By this, when classifying a new document, it is required to be compared ‎with category representatives and these are usually substantially fewer than training documents. This modified K-NN was ‎experimented with in a hierarchical setting, i.e., when categories are represented as a hierarchy. Also, a new document ‎similarity measure was proposed. It focuses on co-occurring or matching terms between a document and a category when ‎calculating the similarity. This measure produces classification accuracy compared to the one obtained if the cosine, Jaccard ‎or Dice similarity measures were used; yet it requires a much less time. The TrechTC-100 hierarchical dataset was used to ‎evaluate the proposed classifier.‎
Keywords :
Text categorization , hierarchical classifiers , K , NN , similarity measures , category representatives
Journal title :
The International Arab Journal of Information Technology (IAJIT)
Journal title :
The International Arab Journal of Information Technology (IAJIT)
Record number :
2543574
Link To Document :
بازگشت