Title :
Improving classification using preprocessing and machine learning algorithms on NSL-KDD dataset
Author :
Deshmukh, Datta H. ; Ghorpade, Tushar ; Padiya, Puja
Author_Institution :
Dept. Of Comput. Eng., Ramrao Adik Inst. Of Technol., Mumbai, India
Abstract :
Classification is the category that consists of identification of class labels of records that are typically described by set of features in dataset. The paper describes a system that uses a set of data pre-processing activities which includes Feature Selection and Discretization. Feature selection and dimension reduction are common data mining approaches in large datasets. Here the high data dimensionality of the dataset due to its large feature set poses a significant challenge. In Pre-processing with the help of Feature selection algorithm the various required features are selected, these activities helps to improve the accuracy of the classifier. After this step various classifiers are used such as Naive Bayes, Hidden Naive Bayes and NBTree. The advantage of Hidden Naive Bayes is a data mining model that relaxes the Naive Bayes Method´s conditional Independence assumption. Also the next Classifier used is NBTree which induces a hybrid of decision tree classifiers and Naïve Bayes classifiers which significantly improves the accuracy of classifier and decreases the Error rate of the classifier. The output of the proposed method are checked for True positive, True negative, False positive, False negative. Based on these values the Accuracy and error rate of each classifier is computed.
Keywords :
data mining; decision trees; feature selection; hidden Markov models; pattern classification; NBTree classifiers; NSL-KDD dataset; Naive Bayes method conditional independence assumption; data dimensionality; data mining approach; data mining model; data preprocessing activities; decision tree classifiers; dimension reduction; feature discretization; feature selection algorithm; hidden Naive Bayes classifiers; machine learning algorithms; record class label identification; Accuracy; Classification algorithms; Computers; Data mining; Error analysis; Intrusion detection; Training; Feature selection; Hidden Naïve Bayes; NBTree; Naïve Bayes; classification; discretization;
Conference_Titel :
Communication, Information & Computing Technology (ICCICT), 2015 International Conference on
Conference_Location :
Mumbai
Print_ISBN :
978-1-4799-5521-3
DOI :
10.1109/ICCICT.2015.7045674