DocumentCode :
1785095
Title :
Flow identification and characteristics mining from internet traffic with hadoop
Author :
Yuanjun Cai ; Bin Wu ; Xinwei Zhang ; Min Luo ; Jinzhao Su
Author_Institution :
Sch. of Comput. Sci., Beijing Univ. of Posts & Commun., Beijing, China
fYear :
2014
fDate :
7-9 July 2014
Firstpage :
1
Lastpage :
5
Abstract :
Characteristics of flow describe the pattern and trend of network traffic, it helps network operator understanding network usage and user behavior, especially useful for those who concerns more about network capacity planning, traffic engineering and fault handling. Due to the large scale of datacenter network and explosive growth of traffic volume, it´s hard to collect, store and analyze Internet traffic on a single machine. Hadoop has become a popular infrastructure for massive data analytics because it facilitates scalable data processing and storage services on a distributed computing system consisting of commodity hardware. In this paper, we present a Hadoop-based traffic analysis system, which accepts input from multiple data traces, performs flow identification, characteristics mining and flow clustering, output of the system provides guidance in resource allocation, flow scheduling and some other tasks. Experiment on a dataset about 8G size from university datacenter network shows that the system is able to finish flow characteristics mining on a four node cluster within 23 minutes.
Keywords :
Internet; data analysis; data mining; pattern clustering; resource allocation; telecommunication traffic; Hadoop; Internet traffic; characteristics mining; commodity hardware; data analytics; distributed computing system; fault handling; flow clustering; flow identification; flow scheduling; multiple data traces; network capacity planning; network operator; network usage; resource allocation; scalable data processing; storage services; traffic engineering; traffic volume; university datacenter network; user behavior; Algorithm design and analysis; Clustering algorithms; Data mining; Educational institutions; IP networks; Internet; Payloads; Flow Characteristic Mining; Flow Clustering; Hadoop; Software Defined Network; Traffic Analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer, Information and Telecommunication Systems (CITS), 2014 International Conference on
Conference_Location :
Jeju
Print_ISBN :
978-1-4799-4384-5
Type :
conf
DOI :
10.1109/CITS.2014.6878955
Filename :
6878955
Link To Document :
بازگشت