Title :
Training genetic programming on half a million patterns: an example from anomaly detection
Author :
Song, Dong ; Heywood, Malcolm I. ; Zincir-Heywood, A. Nur
Author_Institution :
Quest Software Inc., Halifax, NS, Canada
fDate :
6/1/2005 12:00:00 AM
Abstract :
The hierarchical RSS-DSS algorithm is introduced for dynamically filtering large datasets based on the concepts of training pattern age and difficulty, while utilizing a data structure to facilitate the efficient use of memory hierarchies. Such a scheme provides the basis for training genetic programming (GP) on a data set of half a million patterns in 15 min. The method is generic, thus, not specific to a particular GP structure, computing platform, or application context. The method is demonstrated on the real-world KDD-99 intrusion detection data set, resulting in solutions competitive with those identified in the original KDD-99 competition, while only using a fraction of the original features. Parameters of the RSS-DSS algorithm are demonstrated to be effective over a wide range of values. An analysis of different cost functions indicates that hierarchical fitness functions provide the most effective solutions.
Keywords :
data mining; genetic algorithms; learning (artificial intelligence); security of data; anomaly detection; genetic programming training; hierarchical RSS-DSS algorithm; hierarchical fitness functions; large dataset dynamical filtering; real-world KDD-99 intrusion detection data set; Computer applications; Cost function; Data structures; Decision support systems; Decision trees; Detectors; Filtering algorithms; Genetic programming; Helium; Intrusion detection; Dynamic subset selection (DSS); genetic programming (GP); hierarchical cost function; intrusion detection; large data sets;
Journal_Title :
Evolutionary Computation, IEEE Transactions on
DOI :
10.1109/TEVC.2004.841683