• DocumentCode
    3532825
  • Title

    Internet traffic classification based on bag-of-words model

  • Author

    Yin Zhang ; Yi Zhou ; Kai Chen

  • Author_Institution
    Sch. of Inf. Security Eng., Shanghai Jiao Tong Univ., Shanghai, China
  • fYear
    2012
  • fDate
    3-7 Dec. 2012
  • Firstpage
    736
  • Lastpage
    741
  • Abstract
    Interest in traffic classification has dramatically grown in the past few years in both industry and academia. As more and more applications are encrypting the payloads and not to use well-known ports, traditional traffic classification methods such as transport-layer protocol ports based ones can not accurately and efficiently deal with these applications. In this paper we investigate the problem of classifing traffic flows into different application categories. And a new bag-of-words (BoW) model based traffic classification method is proposed, which has been widely used in document classification and computer vision. In the new traffic classification method the application categories of interests represents the bags, centroids represent the words of the BoW model, respectively. By constructing representation vectors for the application categories and calculating the cosine similarity between each category representation vector and newly built-up vector converted from flows to be tested, we can find the application category that a tested flow belongs to. Using real traffic traces we demonstrate that the proposed approach is able to achieve 93% overall accuracy and the classification is not affected by the packet arrival sequences (e.g. out of order arrivals). The overall accuracy of the proposed approach is observed to be higher than the widely used C4.5 algorithm by 10% in our experiment when the out of order arrival happens.
  • Keywords
    Internet; computer vision; decision trees; document handling; pattern classification; queueing theory; telecommunication traffic; BoW model; C4.5 algorithm; Internet traffic classification; bag-of-words model; built-up vector; category representation vector; computer vision; cosine similarity; document classification; packet arrival sequence; Accuracy; Computational modeling; Out of order; Ports (Computers); Protocols; Training; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Globecom Workshops (GC Wkshps), 2012 IEEE
  • Conference_Location
    Anaheim, CA
  • Print_ISBN
    978-1-4673-4942-0
  • Electronic_ISBN
    978-1-4673-4940-6
  • Type

    conf

  • DOI
    10.1109/GLOCOMW.2012.6477666
  • Filename
    6477666