Title :
Reducing Features of KDD CUP 1999 Dataset for Anomaly Detection Using Back Propagation Neural Network
Author :
Shah, Bhavin ; Trivedi, Bhushan H.
Author_Institution :
L.J. Inst. of Manage. Studies, Ahmedabad, India
Abstract :
To detect and classify the anomaly in computer network, KDD CUP 1999 dataset is extensively used. This KDD CUP 1999 data set was generated by domain expert at MIT Lincon lab. To reduced number of features of this KDD CUP data set, various feature reduction techniques has been already used. These techniques reduce features from 41 into range of 10 to 22. Usage of such reduced dataset in machine learning algorithm leads to lower complexity, less processing time and high accuracy. Out of the various feature reduction technique available, one of them is Information Gain (IG) which has been already applied for the random forests classifier by Tesfahun et al. Tesfahun´s approach reduces time and complexity of model and improves the detection rate for the minority classes in a considerable amount. This work investigates the effectiveness and the feasibility of Tesfahun et al.´s feature reduction technique on Back Propagation Neural Network classifier. We had performed various experiments on KDD CUP 1999 dataset and recorded Accuracy, Precision, Recall and Fscore values. In this work, we had done Basic, N-Fold Validation and Testing comparisons on reduced dataset with full feature dataset. Basic comparison clearly shows that the reduced dataset outer performs on size, time and complexity parameters. Experiments of N-Fold validation show that classifier that uses reduced dataset, have better generalization capacity. During the testing comparison, we found both the datasets are equally compatible. All the three comparisons clearly show that reduced dataset is better or is equally compatible, and does not have any drawback as compared to full dataset. Our experiments shows that usage of such reduced dataset in BPNN can lead to better model in terms of dataset size, complexity, processing time and generalization ability.
Keywords :
backpropagation; data handling; generalisation (artificial intelligence); neural nets; BPNN; Fscore values; IG; KDD CUP 1999 data set; KDD CUP 1999 dataset; MIT Lincon lab; anomaly detection; back propagation neural network classifier; complexity; computer network; dataset size; domain expert; feature reduction techniques; generalization ability; generalization capacity; information gain; machine learning algorithm; n-fold validation; reduced dataset; Accuracy; Complexity theory; Feature extraction; Intrusion detection; Power capacitors; Testing; Training; Back Propagation Neural Network; Feature Reduction; Intrusion Detection System; N Fold Validation;
Conference_Titel :
Advanced Computing & Communication Technologies (ACCT), 2015 Fifth International Conference on
Conference_Location :
Haryana
Print_ISBN :
978-1-4799-8487-9
DOI :
10.1109/ACCT.2015.131