• DocumentCode
    1796724
  • Title

    Machine-Learning-Based Feature Selection Techniques for Large-Scale Network Intrusion Detection

  • Author

    Al-Jarrah, O.Y. ; Siddiqui, Afzal ; Elsalamouny, M. ; Yoo, Paul D. ; Muhaidat, Sami ; Kim, Kunsu

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Khalifa Univ., Abu Dhabi, United Arab Emirates
  • fYear
    2014
  • fDate
    June 30 2014-July 3 2014
  • Firstpage
    177
  • Lastpage
    181
  • Abstract
    Nowadays, we see more and more cyber-attacks on major Internet sites and enterprise networks. Intrusion Detection System (IDS) is a critical component of such infrastructure defense mechanism. IDS monitors and analyzes networks´ activities for potential intrusions and security attacks. Machine-learning (ML) models have been well accepted for signature-based IDSs due to their learn ability and flexibility. However, the performance of existing IDSs does not seem to be satisfactory due to the rapid evolution of sophisticated cyber threats in recent decades. Moreover, the volumes of data to be analyzed are beyond the ability of commonly used computer software and hardware tools. They are not only large in scale but fast in/out in terms of velocity. In big data IDS, the one must find an efficient way to reduce the size of data dimensions and volumes. In this paper, we propose novel feature selection methods, namely, RF-FSR (Random Forest-Forward Selection Ranking) and RF-BER (Random Forest-Backward Elimination Ranking). The features selected by the proposed methods were tested and compared with three of the most well-known feature sets in the IDS literature. The experimental results showed that the selected features by the proposed methods effectively improved their detection rate and false-positive rate, achieving 99.8% and 0.001% on well-known KDD-99 dataset, respectively.
  • Keywords
    Big Data; Internet; computer network security; digital signatures; feature selection; learning (artificial intelligence); random processes; Internet sites; KDD-99 dataset; RF-BER; RF-FSR; big data IDS; cyber-attacks; enterprise networks; large-scale network intrusion detection system; machine-learning models; machine-learning-based feature selection techniques; random forest-backward elimination ranking; random forest-forward selection ranking; security attacks; signature-based IDSs; Big data; Computational modeling; Data models; Feature extraction; Intrusion detection; Radio frequency; Training; feature selection; intrusion detection system; machine learning; random forest;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems Workshops (ICDCSW), 2014 IEEE 34th International Conference on
  • Conference_Location
    Madrid
  • ISSN
    1545-0678
  • Print_ISBN
    978-1-4799-4182-7
  • Type

    conf

  • DOI
    10.1109/ICDCSW.2014.14
  • Filename
    6888858