A new ensemble-feature-selection framework for intrusion detection

Author

Hai Thanh Nguyen;Katrin Franke;Slobodan Petrović

Author_Institution

Norwegian Information Security Laboratory, Gj⊘

fYear

2011

Firstpage

213

Lastpage

218

Abstract

Feature selection is an important part of a pattern recognition system. A feature selection method is required to be general enough to find representative features from training data, which are then used for classifying test patterns. The situation where the features selected from the training data are quite different from the representative features of the testing data is called over-selecting. The main causes of the over-selecting phenomenon are: non-comprehensive consideration of statistical properties of the training data, heuristic search strategies for feature selection and small sample size of the data set for training. In this paper, we show the influence of the over-selecting phenomenon on the over-fitting phenomenon of machine learning algorithms. We propose a new framework to address principal causes of over-selecting and thus reduce the chance of over-fitting. Our new framework that we call Ensemble Feature Selection measure (EnFS), allows to consider many statistical properties of a given data set at the same time by combining many feature selection methods used in the filter model. From the chosen feature selection measures, a new combined measure is constructed. We also propose a new search algorithm that ensures the globally optimal feature subsets by means of the constructed measure. The new search approach is based on solving a mixed 0-1 linear programming (M01LP) problem by means of the branch-and-bound algorithm. In this M01LP problem, the number of constraints and variables is linear in the number of full set features. In order to evaluate the quality of our EnFS measure, we chose the design of an intrusion detection system (IDS) as a possible application. Experimental results obtained over the KDD CUP´99 benchmarking data set for IDS show that our EnFS measure is capable of reducing over-fitting by addressing over-selecting.

Keywords

"Testing","Training data","Training","Polynomials","Computational modeling","Programming","Machine learning algorithms"

Publisher

ieee

Conference_Titel

Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on

ISSN

2164-7143

Print_ISBN

978-1-4577-1676-8

Electronic_ISBN

2164-7151

Type

conf

DOI

10.1109/ISDA.2011.6121657

Filename

6121657