Title : 
Classification of Partially Labeled Malicious Web Traffic in the Presence of Concept Drift
         
        
            Author : 
Anastasovski, Goce ; Popstojanova, Katerina Goseva
         
        
            Author_Institution : 
Alarm.com, Vienna, VA, USA
         
        
        
            fDate : 
June 30 2014-July 2 2014
         
        
        
        
            Abstract : 
Attacks to Web systems have shown an increasing trend in the recent past. A contributing factor to this trend is the deployment of Web 2.0 technologies. While work related to characterization and classification of malicious Web traffic using supervised learning exists, little work has been done using semi-supervised learning with partially labeled data. In this paper an incremental semi-supervised algorithm (CSL-Stream) is used to classify malicious Web traffic to multiple classes, as well as to analyze the concept drift and concept evolution phenomena. The work is based on data collected in duration of nine months by a high-interaction honeypot running Web 2.0 applications. The results showed that on completely labeled data semi-supervised learning performed only slightly worse than the supervised learning algorithm. More importantly, multiclass classification of the partially labeled malicious Web traffic (i.e., 50% or 25% labeled sessions) was almost as good as the classification of completely labeled data.
         
        
            Keywords : 
Internet; learning (artificial intelligence); pattern classification; security of data; CSL-Stream; Web 2.0 technologies; Web systems; concept drift; concept evolution; high-interaction honeypot; incremental semisupervised algorithm; malicious Web traffic; partially labeled data; partially labeled malicious Web traffic classification; semisupervised learning; supervised learning; Accuracy; Blogs; Electronic publishing; Information services; Internet; Measurement; Semisupervised learning; Concept drift; Concept evolution; Malicious Web traffic classification; Multiclass classification; Semi-supervised learning; Web 2.0 security;
         
        
        
        
            Conference_Titel : 
Software Security and Reliability-Companion (SERE-C), 2014 IEEE Eighth International Conference on
         
        
            Conference_Location : 
San Francisco, CA
         
        
        
            DOI : 
10.1109/SERE-C.2014.31