Title :
Head-body partitioned string matching for Deep Packet Inspection with scalable and attack-resilient performance
Author :
Yang, Yi-Hua E. ; Prasanna, Viktor K. ; Jiang, Chenqian
Author_Institution :
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
Abstract :
Dictionary-based string matching (DBSM) is a critical component of Deep Packet Inspection (DPI), where thousands of malicious patterns are matched against high-bandwidth network traffic. Deterministic finite automata constructed with the Aho-Corasick algorithm (AC-DFA) have been widely used for solving this problem. However, the state transition table (STT) of a large-scale DBSM AC-DFA can span hundreds of megabytes of system memory, whose limited bandwidth and long latency could become the performance bottleneck We propose a novel partitioning algorithm which converts an AC-DFA into a ¿head¿ and a ¿body¿ parts. The head part behaves as a traditional AC-DFA that matches the pattern prefixes up to a predefined length; the body part extends any head match to the full pattern length in parallel body-tree traversals. Taking advantage of the SIMD instructions in modern x86-64 multi-core processors, we design compact and efficient data structures packing multi-path and multi-stride pattern segments in the body-tree. Compared with an optimized AC-DFA solution, our head-body matching (HBM) implementation achieves 1.2x to 3x throughput performance when the input match (attack) ratio varies from 2% to 32%, respectively. Our HBM data structure is over 20x smaller than a fully-populated AC-DFA for both Snort and ClamAV dictionaries. The aggregated throughput of our HBM approach scales almost 7x with 8 threads to over 10 Gbps in a dual-socket quad-core Opteron (Shanghai) server.
Keywords :
data structures; deterministic automata; finite automata; parallel processing; security of data; string matching; Aho-Corasick algorithm; SIMD instructions; attack-resilient performance; data structures; deep packet inspection; deterministic finite automata; dictionary-based string matching; dual-socket quad-core Opteron server; head-body partitioned string matching; high-bandwidth network traffic; malicious pattern matching; modern x86-64 multi-core processors; parallel body-tree traversals; state transition table; Automata; Bandwidth; Data structures; Delay; Inspection; Large-scale systems; Partitioning algorithms; Pattern matching; Telecommunication traffic; Throughput; DFA; NFA; SIMD; String matching; intrusion detection; multi-core processor; multi-stride tree; tree topology; virus scanning;
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-6442-5
DOI :
10.1109/IPDPS.2010.5470396