Title :
Web page classification using firefly optimization
Author :
Sarac, Esra ; Ozel, Selma Ayse
Author_Institution :
Dept. of Comput. Eng., Cukurova Univ., Adana, Turkey
Abstract :
Increase in the amount of information on the Web has caused the need for accurate automated classifiers for Web pages to maintain Web directories and to increase search engines´ performance. As every (HTML/XML) tag and every term on each Web page can be considered as a feature, we need efficient methods to select best features to reduce feature space of the Web page classification problem. In this study, our aim is to apply a recent optimization technique namely the firefly algorithm (FA), to select best features for Web page classification problem. The firefly algorithm (FA) is a metaheuristic algorithm, inspired by the flashing behavior of fireflies. In this study, we use FA to select a subset of features, and to evaluate the fitness of the selected features J48 classifier of the Weka data mining tool is employed. WebKB and Conference datasets were used to evaluate the effectiveness of the proposed feature selection system. We observed that when a subset of features are selected by using FA, WebKB and Conference datasets were classified without loss of accuracy, even more, time needed to classify new Web pages reduced sharply as the number of features were decreased.
Keywords :
Web sites; XML; data mining; feature extraction; optimisation; pattern classification; search engines; HTML tag; J48 classifier; Web directories; Web information; Web page classification problem; WebKB datasets; Weka data mining tool; XML tag; automated Web page classifiers; conference datasets; feature selection system; feature space; firefly flashing behavior; firefly optimization algorithm; fitness evaluation; metaheuristic algorithm; search engine performance; Classification algorithms; Educational institutions; Feature extraction; Optimization; Search engines; Training; Web pages; Classification; Feature selection; Firefly Algorithm; Web page classification;
Conference_Titel :
Innovations in Intelligent Systems and Applications (INISTA), 2013 IEEE International Symposium on
Conference_Location :
Albena
Print_ISBN :
978-1-4799-0659-8
DOI :
10.1109/INISTA.2013.6577619