Title of article :
A Text Classification Method Based on Combination of Information Gain and Graph Clustering
Author/Authors :
Abdollahpouri, Alireza Department of Computer Engineering - University of Kurdistan, Sanandaj, Iran , Rahimi, Shadi Department of Computer Engineering - University of Kurdistan, Sanandaj, Iran , Zamani, Fatemeh Department of Computer Engineering - University of Kurdistan, Sanandaj, Iran , Moradi, Parham Department of Computer Engineering - University of Kurdistan, Sanandaj, Iran
Abstract :
Text classification has a wide range of applications such as: spam filtering, automated indexing of scientific articles,
identifying the genre of documents, news monitoring, and so on. Text datasets usually contain much irrelevant and noisy
information which eventually reduces the efficiency and cost of their classification. Therefore, for effective text classification,
feature selection methods are widely used to handle the high dimensionality of data. In this paper, a novel feature selection
method based on the combination of information gain and FAST algorithm is proposed. In our proposed method, at first, the
information gain is calculated for the features and those with higher information gain are selected. The FAST algorithm is then
used on the selected features which uses graph-theoretic clustering methods. To evaluate the performance of the proposed
method, we carry out experiments on three text datasets and compare our algorithm with several feature selection techniques.
The results confirm that the proposed method produces smaller feature subset in shorter time. In addition, the evaluation of a
K-nearest neighborhood classifier on validation data show that, the novel algorithm gives higher classification accuracy.
Keywords :
Feature selection , Information gain , text categorization , FAST algorithm
Journal title :
International Journal of Information and Communication Technology Research