Title :
Usilng Machine Learning Technliques to Identify Botnet Traffic
Author :
Livadas, Carl ; Walsh, Robert ; Lapsley, David ; Strayer, W. Timothy
Author_Institution :
Dept. of Internetwork Res., BBN Technol., Cambridge, MA
Abstract :
To date, techniques to counter cyber-attacks have predominantly been reactive; they focus on monitoring network traffic, detecting anomalies and cyber-attack traffic patterns, and, a posteriori, combating the cyber-attacks and mitigating their effects. Contrary to such approaches, we advocate proactively detecting and identifying botnets prior to their being used as part of a cyber-attack (Strayer et al., 2006). In this paper, we present our work on using machine learning-based classification techniques to identify the command and control (C2) traffic of IRC-based botnets - compromised hosts that are collectively commanded using Internet relay chat (IRC). We split this task into two stages: (I) distinguishing between IRC and non-IRC traffic, and (II) distinguishing between botnet and real IRC traffic. For stage I, we compare the performance of J48, naive Bayes, and Bayesian network classifiers, identify the features that achieve good overall classification accuracy, and determine the classification sensitivity to the training set size. While sensitive to the training data and the attributes used to characterize communication flows, machine learning-based classifiers show promise in identifying IRC traffic. Using classification in stage II is trickier, since accurately labeling IRC traffic as botnet and non-botnet is challenging. We are currently exploring labeling flows as suspicious and non-suspicious based on telltales of hosts being compromised
Keywords :
Internet; belief networks; command and control systems; learning (artificial intelligence); telecommunication traffic; Bayesian network classifier; IRC-based botnets; Internet relay chat; anomaly detection; botnet traffic; classification sensitivity; classification technique; command and control traffic; communication flow; cyber-attack traffic pattern; machine learning; naive Bayes; network traffic monitoring; real IRC traffic; Bayesian methods; Command and control systems; Communication system traffic control; Counting circuits; Internet; Labeling; Machine learning; Monitoring; Relays; Telecommunication traffic;
Conference_Titel :
Local Computer Networks, Proceedings 2006 31st IEEE Conference on
Conference_Location :
Tampa, FL
Print_ISBN :
1-4244-0418-5
Electronic_ISBN :
0742-1303
DOI :
10.1109/LCN.2006.322210