DocumentCode :
25442
Title :
Internet Traffic Classification Using Constrained Clustering
Author :
Yu Wang ; Yang Xiang ; Jun Zhang ; Wanlei Zhou ; Guiyi Wei ; Yang, L.T.
Author_Institution :
Sch. of Comput. Sci. & Technol., Huazhong Univ. of Sci. & Technol., Wuhan, China
Volume :
25
Issue :
11
fYear :
2014
fDate :
Nov. 2014
Firstpage :
2932
Lastpage :
2943
Abstract :
Statistics-based Internet traffic classification using machine learning techniques has attracted extensive research interest lately, because of the increasing ineffectiveness of traditional port-based and payload-based approaches. In particular, unsupervised learning, that is, traffic clustering, is very important in real-life applications, where labeled training data are difficult to obtain and new patterns keep emerging. Although previous studies have applied some classic clustering algorithms such as K-Means and EM for the task, the quality of resultant traffic clusters was far from satisfactory. In order to improve the accuracy of traffic clustering, we propose a constrained clustering scheme that makes decisions with consideration of some background information in addition to the observed traffic statistics. Specifically, we make use of equivalence set constraints indicating that particular sets of flows are using the same application layer protocols, which can be efficiently inferred from packet headers according to the background knowledge of TCP/IP networking. We model the observed data and constraints using Gaussian mixture density and adapt an approximate algorithm for the maximum likelihood estimation of model parameters. Moreover, we study the effects of unsupervised feature discretization on traffic clustering by using a fundamental binning method. A number of real-world Internet traffic traces have been used in our evaluation, and the results show that the proposed approach not only improves the quality of traffic clusters in terms of overall accuracy and per-class metrics, but also speeds up the convergence.
Keywords :
Gaussian processes; Internet; maximum likelihood estimation; mixture models; pattern classification; pattern clustering; telecommunication traffic; transport protocols; unsupervised learning; EM; Gaussian mixture density; TCP/IP networking; application layer protocols; constrained clustering scheme; equivalence set constraints; fundamental binning method; k-means clustering algorithms; labeled training data; machine learning techniques; maximum likelihood estimation; observed traffic statistics; packet headers; payload-based approaches; port-based approach; statistics-based Internet traffic classification; traffic clustering; unsupervised feature discretization; unsupervised learning; Accuracy; Adaptation models; Clustering algorithms; Data models; Internet; Maximum likelihood estimation; Unsupervised learning; Algorithms; clustering; machine learning; network security; traffic analysis;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2013.307
Filename :
6684161
Link To Document :
بازگشت