Improving classification using preprocessing and machine learning algorithms on NSL-KDD dataset

Author

Deshmukh, Datta H. ; Ghorpade, Tushar ; Padiya, Puja

Author_Institution

Dept. Of Comput. Eng., Ramrao Adik Inst. Of Technol., Mumbai, India

fYear

2015

Firstpage

1

Lastpage

6

Abstract

Classification is the category that consists of identification of class labels of records that are typically described by set of features in dataset. The paper describes a system that uses a set of data pre-processing activities which includes Feature Selection and Discretization. Feature selection and dimension reduction are common data mining approaches in large datasets. Here the high data dimensionality of the dataset due to its large feature set poses a significant challenge. In Pre-processing with the help of Feature selection algorithm the various required features are selected, these activities helps to improve the accuracy of the classifier. After this step various classifiers are used such as Naive Bayes, Hidden Naive Bayes and NBTree. The advantage of Hidden Naive Bayes is a data mining model that relaxes the Naive Bayes Method´s conditional Independence assumption. Also the next Classifier used is NBTree which induces a hybrid of decision tree classifiers and Naïve Bayes classifiers which significantly improves the accuracy of classifier and decreases the Error rate of the classifier. The output of the proposed method are checked for True positive, True negative, False positive, False negative. Based on these values the Accuracy and error rate of each classifier is computed.

Keywords

data mining; decision trees; feature selection; hidden Markov models; pattern classification; NBTree classifiers; NSL-KDD dataset; Naive Bayes method conditional independence assumption; data dimensionality; data mining approach; data mining model; data preprocessing activities; decision tree classifiers; dimension reduction; feature discretization; feature selection algorithm; hidden Naive Bayes classifiers; machine learning algorithms; record class label identification; Accuracy; Classification algorithms; Computers; Data mining; Error analysis; Intrusion detection; Training; Feature selection; Hidden Naïve Bayes; NBTree; Naïve Bayes; classification; discretization;

fLanguage

English

Publisher

ieee

Conference_Titel

Communication, Information & Computing Technology (ICCICT), 2015 International Conference on

Conference_Location

Mumbai

Print_ISBN

978-1-4799-5521-3

Type

conf

DOI

10.1109/ICCICT.2015.7045674

Filename

7045674