• DocumentCode
    2027871
  • Title

    A novel semi-supervised approach for network traffic clustering

  • Author

    Wang, Yu ; Xiang, Yang ; Zhang, Jun ; Yu, Shunzheng

  • Author_Institution
    Sch. of Inf. Technol., Deakin Univ., Melbourne, VIC, Australia
  • fYear
    2011
  • fDate
    6-8 Sept. 2011
  • Firstpage
    169
  • Lastpage
    175
  • Abstract
    Network traffic classification is an essential component for network management and security systems. To address the limitations of traditional port-based and payload-based methods, recent studies have been focusing on alternative approaches. One promising direction is applying machine learning techniques to classify traffic flows based on packet and flow level statistics. In particular, previous papers have illustrated that clustering can achieve high accuracy and discover unknown application classes. In this work, we present a novel semi-supervised learning method using constrained clustering algorithms. The motivation is that in network domain a lot of background information is available in addition to the data instances themselves. For example, we might know that flow f1 and f2 are using the same application protocol because they are visiting the same host address at the same port simultaneously. In this case, f1 and f2 shall be grouped into the same cluster ideally. Therefore, we describe these correlations in the form of pair-wise must-link constraints and incorporate them in the process of clustering. We have applied three constrained variants of the K-Means algorithm, which perform hard or soft constraint satisfaction and metric learning from constraints. A number of real-world traffic traces have been used to show the availability of constraints and to test the proposed approach. The experimental results indicate that by incorporating constraints in the course of clustering, the overall accuracy and cluster purity can be significantly improved.
  • Keywords
    computer network management; computer network security; constraint handling; learning (artificial intelligence); pattern clustering; constrained clustering algorithms; constraint satisfaction; flow level statistics; k-means algorithm; machine learning techniques; metric learning; network management; network traffic classification; network traffic clustering; payload based methods; port based methods; security systems; semi supervised approach; semi supervised learning method; Classification algorithms; Clustering algorithms; Correlation; Measurement; Partitioning algorithms; Payloads; Protocols; constrained clustering; constraints; machine learning; semi-supervised learning; traffic classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network and System Security (NSS), 2011 5th International Conference on
  • Conference_Location
    Milan
  • Print_ISBN
    978-1-4577-0458-1
  • Type

    conf

  • DOI
    10.1109/ICNSS.2011.6059997
  • Filename
    6059997