• DocumentCode
    1785095
  • Title

    Flow identification and characteristics mining from internet traffic with hadoop

  • Author

    Yuanjun Cai ; Bin Wu ; Xinwei Zhang ; Min Luo ; Jinzhao Su

  • Author_Institution
    Sch. of Comput. Sci., Beijing Univ. of Posts & Commun., Beijing, China
  • fYear
    2014
  • fDate
    7-9 July 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Characteristics of flow describe the pattern and trend of network traffic, it helps network operator understanding network usage and user behavior, especially useful for those who concerns more about network capacity planning, traffic engineering and fault handling. Due to the large scale of datacenter network and explosive growth of traffic volume, it´s hard to collect, store and analyze Internet traffic on a single machine. Hadoop has become a popular infrastructure for massive data analytics because it facilitates scalable data processing and storage services on a distributed computing system consisting of commodity hardware. In this paper, we present a Hadoop-based traffic analysis system, which accepts input from multiple data traces, performs flow identification, characteristics mining and flow clustering, output of the system provides guidance in resource allocation, flow scheduling and some other tasks. Experiment on a dataset about 8G size from university datacenter network shows that the system is able to finish flow characteristics mining on a four node cluster within 23 minutes.
  • Keywords
    Internet; data analysis; data mining; pattern clustering; resource allocation; telecommunication traffic; Hadoop; Internet traffic; characteristics mining; commodity hardware; data analytics; distributed computing system; fault handling; flow clustering; flow identification; flow scheduling; multiple data traces; network capacity planning; network operator; network usage; resource allocation; scalable data processing; storage services; traffic engineering; traffic volume; university datacenter network; user behavior; Algorithm design and analysis; Clustering algorithms; Data mining; Educational institutions; IP networks; Internet; Payloads; Flow Characteristic Mining; Flow Clustering; Hadoop; Software Defined Network; Traffic Analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer, Information and Telecommunication Systems (CITS), 2014 International Conference on
  • Conference_Location
    Jeju
  • Print_ISBN
    978-1-4799-4384-5
  • Type

    conf

  • DOI
    10.1109/CITS.2014.6878955
  • Filename
    6878955