Title :
Bypassing Space Explosion in High-Speed Regular Expression Matching
Author :
Patel, Jatin ; Liu, Alex X. ; Torng, Eric
Author_Institution :
Dept. of Comput. Sci. & Eng., Michigan State Univ., East Lansing, MI, USA
Abstract :
Network intrusion detection and prevention systems commonly use regular expression (RE) signatures to represent individual security threats. While the corresponding deterministic finite state automata (DFA) for any one RE is typically small, the DFA that corresponds to the entire set of REs is usually too large to be constructed or deployed. To address this issue, a variety of alternative automata implementations that compress the size of the final automaton have been proposed such as extended finite automata (XFA) and delayed input DFA (D 2FA). The resulting final automata are typically much smaller than the corresponding DFA. However, the previously proposed automata construction algorithms do suffer from some drawbacks. First, most employ a “Union then Minimize” framework where the automata for each RE are first joined before minimization occurs. This leads to an expensive nondeterministic finite automata (NFA) to DFA subset construction on a relatively large NFA. Second, most construct the corresponding large DFA as an intermediate step. In some cases, this DFA is so large that the final automaton cannot be constructed even though the final automaton is small enough to be deployed. In this paper, we propose a “Minimize then Union” framework for constructing compact alternative automata focusing on the D 2FA. We show that we can construct an almost optimal final D 2FA with small intermediate parsers. The key to our approach is a space- and time-efficient routine for merging two compact D 2FA into a compact D 2FA. In our experiments, our algorithm runs on average 155 times faster and uses 1500 times less memory than previous algorithms. For example, we are able to construct a D 2FA with over 80 000 000 states using only 1 GB of main memory in only 77 min.
Keywords :
deterministic automata; finite state machines; pattern matching; telecommunication security; D2FA; NFA; RE; XFA; bypassing space explosion; delayed input DFA; deterministic finite state automata; expensive nondeterministic finite automata; extended finite automata; high-speed regular expression matching; minimize then union framework; network intrusion detection; network prevention systems; regular expression signatures; security threats; union then minimize framework; Automata; Explosions; Intrusion detection; Memory management; Minimization; Standards; Deep packet inspection; information security; intrusion detection and prevention; network security; regular expression matching;
Journal_Title :
Networking, IEEE/ACM Transactions on
DOI :
10.1109/TNET.2014.2309014