DocumentCode
2780579
Title
Hybrid ACO and TOFA feature selection approach for text classification
Author
Alghamdi, Hanan S. ; Tang, H. Lilian ; Alshomrani, Saleh
Author_Institution
Inf. Syst. Dept., King Abdulaziz Univ., Jeddah, Saudi Arabia
fYear
2012
fDate
10-15 June 2012
Firstpage
1
Lastpage
6
Abstract
With the highly increasing availability of text data on the Internet, the process of selecting an appropriate set of features for text classification becomes more important, for not only reducing the dimensionality of the feature space, but also for improving the classification performance. This paper proposes a novel feature selection approach to improve the performance of text classifier based on an integration of Ant Colony Optimization algorithm (ACO) and Trace Oriented Feature Analysis (TOFA). ACO is metaheuristic search algorithm derived by the study of foraging behavior of real ants, specifically the pheromone communication to find the shortest path to the food source. TOFA is a unified optimization framework developed to integrate and unify several state-of-the-art dimension reduction algorithms through optimization framework. It has been shown in previous research that ACO is one of the promising approaches for optimization and feature selection problems. TOFA is capable of dealing with large scale text data and can be applied to several text analysis applications such as text classification, clustering and retrieval. For classification performance yet effective, the proposed approach makes use of TOFA and classifier performance as heuristic information of ACO. The results on Reuters and Brown public datasets demonstrate the effectiveness of the proposed approach.
Keywords
Internet; information retrieval; optimisation; pattern classification; pattern clustering; search problems; text analysis; Brown public datasets; Internet; Reuters public datasets; TOFA feature selection approach; ant colony optimization algorithm; ant foraging behavior; dimension reduction algorithms; feature space dimensionality; hybrid ACO; metaheuristic search algorithm; text classification; text clustering; text data availability; text retrieval; trace oriented feature analysis; unified optimization framework; Accuracy; Algorithm design and analysis; Bayesian methods; Classification algorithms; Feature extraction; Optimization; Text categorization; Ant Colony Optimization; Trace Oriented Feature Ananlysis; feature selection; text classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Evolutionary Computation (CEC), 2012 IEEE Congress on
Conference_Location
Brisbane, QLD
Print_ISBN
978-1-4673-1510-4
Electronic_ISBN
978-1-4673-1508-1
Type
conf
DOI
10.1109/CEC.2012.6252960
Filename
6252960
Link To Document