DocumentCode :
3645149
Title :
A new ensemble-feature-selection framework for intrusion detection
Author :
Hai Thanh Nguyen;Katrin Franke;Slobodan Petrović
Author_Institution :
Norwegian Information Security Laboratory, Gj⊘
fYear :
2011
Firstpage :
213
Lastpage :
218
Abstract :
Feature selection is an important part of a pattern recognition system. A feature selection method is required to be general enough to find representative features from training data, which are then used for classifying test patterns. The situation where the features selected from the training data are quite different from the representative features of the testing data is called over-selecting. The main causes of the over-selecting phenomenon are: non-comprehensive consideration of statistical properties of the training data, heuristic search strategies for feature selection and small sample size of the data set for training. In this paper, we show the influence of the over-selecting phenomenon on the over-fitting phenomenon of machine learning algorithms. We propose a new framework to address principal causes of over-selecting and thus reduce the chance of over-fitting. Our new framework that we call Ensemble Feature Selection measure (EnFS), allows to consider many statistical properties of a given data set at the same time by combining many feature selection methods used in the filter model. From the chosen feature selection measures, a new combined measure is constructed. We also propose a new search algorithm that ensures the globally optimal feature subsets by means of the constructed measure. The new search approach is based on solving a mixed 0-1 linear programming (M01LP) problem by means of the branch-and-bound algorithm. In this M01LP problem, the number of constraints and variables is linear in the number of full set features. In order to evaluate the quality of our EnFS measure, we chose the design of an intrusion detection system (IDS) as a possible application. Experimental results obtained over the KDD CUP´99 benchmarking data set for IDS show that our EnFS measure is capable of reducing over-fitting by addressing over-selecting.
Keywords :
"Testing","Training data","Training","Polynomials","Computational modeling","Programming","Machine learning algorithms"
Publisher :
ieee
Conference_Titel :
Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on
ISSN :
2164-7143
Print_ISBN :
978-1-4577-1676-8
Electronic_ISBN :
2164-7151
Type :
conf
DOI :
10.1109/ISDA.2011.6121657
Filename :
6121657
Link To Document :
بازگشت