Title of article :
An enhanced ACO algorithm to select features for text categorization and its parallelization
Author/Authors :
Janaki Meena، نويسنده , , M. Ravi chandran، نويسنده , , K.R. and Karthik، نويسنده , , A. and Vijay Samuel، نويسنده , , A.، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2012
Abstract :
Feature selection is an indispensable preprocessing step for effective analysis of high dimensional data. It removes irrelevant features, improves the predictive accuracy and increases the comprehensibility of the model constructed by the classifiers sensitive to features. Finding an optimal feature subset for a problem in an outsized domain becomes intractable and many such feature selection problems have been shown to be NP-hard. Optimization algorithms are frequently designed for NP-hard problems to find nearly optimal solutions with a practical time complexity. This paper formulates the text feature selection problem as a combinatorial problem and proposes an Ant Colony Optimization (ACO) algorithm to find the nearly optimal solution for the same. It differs from the earlier algorithm by Aghdam et al. by including a heuristic function based on statistics and a local search. The algorithm aims at determining a solution that includes ‘n’ distinct features for each category. Optimization algorithms based on wrapper models show better results but the processes involved in them are time intensive. The availability of parallel architectures as a cluster of machines connected through fast Ethernet has increased the interest on parallelization of algorithms. The proposed ACO algorithm was parallelized and demonstrated with a cluster formed with a maximum of six machines. Documents from 20 newsgroup benchmark dataset were used for experimentation. Features selected by the proposed algorithm were evaluated using Naïve bayes classifier and compared with the standard feature selection techniques. It was observed that the performance of the classifier had been improved with the features selected by the enhanced ACO and local search. Error of the classifier decreases over iterations and it was observed that the number of positive features increases with the number of iterations.
Keywords :
Bag of words , Ant Colony Optimization , Local search , CHIR , ?2 , Parallel algorithm , mapreduce , Distributed environment , Metaheuristic algorithms , Heuristic information
Journal title :
Expert Systems with Applications
Journal title :
Expert Systems with Applications