Title :
ProWord: An unsupervised approach to protocol feature word extraction
Author :
Zhuo Zhang ; Zhibin Zhang ; Lee, Patrick P. C. ; Yunjie Liu ; Gaogang Xie
Author_Institution :
Inst. of Comput. Technol., Beijing, China
fDate :
April 27 2014-May 2 2014
Abstract :
Protocol feature words are byte subsequences within traffic payload that can distinguish application protocols, and they form the building blocks of many constructions of deep packet analysis rules in network management, measurement, and security systems. However, how to systematically and efficiently extract protocol feature words from network traffic remains a challenging issue. Existing n-gram approaches simply break pay-load into equal-length pieces and are ineffective in capturing the hidden statistical structure of the payload content. In this paper, we propose ProWord, an unsupervised approach that extracts protocol feature words from traffic traces. ProWord builds on two nontrivial algorithms. First, we propose an unsupervised segmentation algorithm based on the modified Voting Experts algorithm, such that we break payload into candidate words according to entropy information and provide more accurate segmentation than existing n-gram approaches. Second, we propose a ranking algorithm that incorporates different types of well-known feature word retrieval heuristics, such that we can build an ordered structure on the candidate words and select the highest ranked ones as protocol feature words. We compare ProWord and existing n-gram approaches via evaluation on real-world traffic traces. We show that ProWord captures true protocol feature words more accurately and performs significantly faster.
Keywords :
computer network management; computer network security; feature extraction; information retrieval; protocols; telecommunication traffic; unsupervised learning; word processing; ProWord; byte subsequences; deep packet analysis; entropy information; feature word retrieval heuristics; n-gram approach; network management; nontrivial algorithm; protocol feature word extraction; ranking algorithm; security system; statistical structure; traffic payload; traffic trace; unsupervised segmentation algorithm; voting expert algorithm; Algorithm design and analysis; Computers; Entropy; Feature extraction; Partitioning algorithms; Payloads; Protocols;
Conference_Titel :
INFOCOM, 2014 Proceedings IEEE
Conference_Location :
Toronto, ON
DOI :
10.1109/INFOCOM.2014.6848073