DocumentCode
1716308
Title
Improving classification using preprocessing and machine learning algorithms on NSL-KDD dataset
Author
Deshmukh, Datta H. ; Ghorpade, Tushar ; Padiya, Puja
Author_Institution
Dept. Of Comput. Eng., Ramrao Adik Inst. Of Technol., Mumbai, India
fYear
2015
Firstpage
1
Lastpage
6
Abstract
Classification is the category that consists of identification of class labels of records that are typically described by set of features in dataset. The paper describes a system that uses a set of data pre-processing activities which includes Feature Selection and Discretization. Feature selection and dimension reduction are common data mining approaches in large datasets. Here the high data dimensionality of the dataset due to its large feature set poses a significant challenge. In Pre-processing with the help of Feature selection algorithm the various required features are selected, these activities helps to improve the accuracy of the classifier. After this step various classifiers are used such as Naive Bayes, Hidden Naive Bayes and NBTree. The advantage of Hidden Naive Bayes is a data mining model that relaxes the Naive Bayes Method´s conditional Independence assumption. Also the next Classifier used is NBTree which induces a hybrid of decision tree classifiers and Naïve Bayes classifiers which significantly improves the accuracy of classifier and decreases the Error rate of the classifier. The output of the proposed method are checked for True positive, True negative, False positive, False negative. Based on these values the Accuracy and error rate of each classifier is computed.
Keywords
data mining; decision trees; feature selection; hidden Markov models; pattern classification; NBTree classifiers; NSL-KDD dataset; Naive Bayes method conditional independence assumption; data dimensionality; data mining approach; data mining model; data preprocessing activities; decision tree classifiers; dimension reduction; feature discretization; feature selection algorithm; hidden Naive Bayes classifiers; machine learning algorithms; record class label identification; Accuracy; Classification algorithms; Computers; Data mining; Error analysis; Intrusion detection; Training; Feature selection; Hidden Naïve Bayes; NBTree; Naïve Bayes; classification; discretization;
fLanguage
English
Publisher
ieee
Conference_Titel
Communication, Information & Computing Technology (ICCICT), 2015 International Conference on
Conference_Location
Mumbai
Print_ISBN
978-1-4799-5521-3
Type
conf
DOI
10.1109/ICCICT.2015.7045674
Filename
7045674
Link To Document