• DocumentCode
    240910
  • Title

    Classification of Partially Labeled Malicious Web Traffic in the Presence of Concept Drift

  • Author

    Anastasovski, Goce ; Popstojanova, Katerina Goseva

  • Author_Institution
    Alarm.com, Vienna, VA, USA
  • fYear
    2014
  • fDate
    June 30 2014-July 2 2014
  • Firstpage
    130
  • Lastpage
    139
  • Abstract
    Attacks to Web systems have shown an increasing trend in the recent past. A contributing factor to this trend is the deployment of Web 2.0 technologies. While work related to characterization and classification of malicious Web traffic using supervised learning exists, little work has been done using semi-supervised learning with partially labeled data. In this paper an incremental semi-supervised algorithm (CSL-Stream) is used to classify malicious Web traffic to multiple classes, as well as to analyze the concept drift and concept evolution phenomena. The work is based on data collected in duration of nine months by a high-interaction honeypot running Web 2.0 applications. The results showed that on completely labeled data semi-supervised learning performed only slightly worse than the supervised learning algorithm. More importantly, multiclass classification of the partially labeled malicious Web traffic (i.e., 50% or 25% labeled sessions) was almost as good as the classification of completely labeled data.
  • Keywords
    Internet; learning (artificial intelligence); pattern classification; security of data; CSL-Stream; Web 2.0 technologies; Web systems; concept drift; concept evolution; high-interaction honeypot; incremental semisupervised algorithm; malicious Web traffic; partially labeled data; partially labeled malicious Web traffic classification; semisupervised learning; supervised learning; Accuracy; Blogs; Electronic publishing; Information services; Internet; Measurement; Semisupervised learning; Concept drift; Concept evolution; Malicious Web traffic classification; Multiclass classification; Semi-supervised learning; Web 2.0 security;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Security and Reliability-Companion (SERE-C), 2014 IEEE Eighth International Conference on
  • Conference_Location
    San Francisco, CA
  • Type

    conf

  • DOI
    10.1109/SERE-C.2014.31
  • Filename
    6901650