• DocumentCode
    1716308
  • Title

    Improving classification using preprocessing and machine learning algorithms on NSL-KDD dataset

  • Author

    Deshmukh, Datta H. ; Ghorpade, Tushar ; Padiya, Puja

  • Author_Institution
    Dept. Of Comput. Eng., Ramrao Adik Inst. Of Technol., Mumbai, India
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Classification is the category that consists of identification of class labels of records that are typically described by set of features in dataset. The paper describes a system that uses a set of data pre-processing activities which includes Feature Selection and Discretization. Feature selection and dimension reduction are common data mining approaches in large datasets. Here the high data dimensionality of the dataset due to its large feature set poses a significant challenge. In Pre-processing with the help of Feature selection algorithm the various required features are selected, these activities helps to improve the accuracy of the classifier. After this step various classifiers are used such as Naive Bayes, Hidden Naive Bayes and NBTree. The advantage of Hidden Naive Bayes is a data mining model that relaxes the Naive Bayes Method´s conditional Independence assumption. Also the next Classifier used is NBTree which induces a hybrid of decision tree classifiers and Naïve Bayes classifiers which significantly improves the accuracy of classifier and decreases the Error rate of the classifier. The output of the proposed method are checked for True positive, True negative, False positive, False negative. Based on these values the Accuracy and error rate of each classifier is computed.
  • Keywords
    data mining; decision trees; feature selection; hidden Markov models; pattern classification; NBTree classifiers; NSL-KDD dataset; Naive Bayes method conditional independence assumption; data dimensionality; data mining approach; data mining model; data preprocessing activities; decision tree classifiers; dimension reduction; feature discretization; feature selection algorithm; hidden Naive Bayes classifiers; machine learning algorithms; record class label identification; Accuracy; Classification algorithms; Computers; Data mining; Error analysis; Intrusion detection; Training; Feature selection; Hidden Naïve Bayes; NBTree; Naïve Bayes; classification; discretization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communication, Information & Computing Technology (ICCICT), 2015 International Conference on
  • Conference_Location
    Mumbai
  • Print_ISBN
    978-1-4799-5521-3
  • Type

    conf

  • DOI
    10.1109/ICCICT.2015.7045674
  • Filename
    7045674