Title :
Hadoop based Deep Packet Inspection system for traffic analysis of e-business websites
Author :
Jiangtao Luo ; Yan Liang ; Wei Gao ; Junchao Yang
Author_Institution :
Electron. Inf. & Networking Res. Inst., Chongqing Univ. of Posts & Telecommun., Chongqing, China
Abstract :
Internet traffic is experiencing an explosive growth, and online shopping is one of the significant drivers. However, alert network operators, unwilling to be dumb pipes, are making every effort to mine mass traffic with the help of Deep Packet Inspection (DPI) which is regarded as a big challenge especially for massive data when traditional methods and programming model are utilized. Hadoop provides an alternative approach with its strength in distributed storage and parallel computing. In this paper, a Hadoop based DPI system was reported, which was integrated with a web crawler. The system architecture and MapReduce models of packet analysis, web URL restoration were presented. As an example, live web traffic visiting the Tmall, the leading e-shopping giant in China, was specifically investigated using this system. Popularity of product, category and brand for a certain period was evaluated from page views of product. The detailed information of products was provided by the product information base built by the web crawler. This work explored the methodology of using Hadoop in DPI and presented valuable guidelines to develop such a system, which can be further used in analyzing other services and mining the value of network traffic by network operators.
Keywords :
Internet; Web sites; data mining; electronic commerce; information retrieval; parallel processing; retail data processing; telecommunication traffic; China; Hadoop based DPI system; Hadoop based deep packet inspection system; Internet traffic; MapReduce models; Web URL restoration; Web crawler; brand; category; distributed storage; e-business Websites; e-shopping giant; mass traffic mining; online shopping; packet analysis; parallel computing; product page view; product popularity; traffic analysis; Cloud computing; Crawlers; Databases; Inspection; Uniform resource locators; Web pages;
Conference_Titel :
Data Science and Advanced Analytics (DSAA), 2014 International Conference on
DOI :
10.1109/DSAA.2014.7058097