• DocumentCode
    737957
  • Title

    MIN-MAX: A Counter-Based Algorithm for Regular Expression Matching

  • Author

    Wang, Hao ; Pu, Shi ; Knezek, Gabe ; Liu, Jyh-Charn

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Texas A&M Univ., College Station, TX, USA
  • Volume
    24
  • Issue
    1
  • fYear
    2013
  • Firstpage
    92
  • Lastpage
    103
  • Abstract
    We propose an NFA-based algorithm called MIN-MAX to support matching of regular expressions (regexp) composed of Character Classes with Constraint Repetitions (CCR). MIN-MAX is well suited for massive parallel processing architectures, such as FPGAs, yet it is effective on any other computing platform. In MIN-MAX, each active CCR engine (to implement one CCR term) evaluates input characters, updates (MIN, MAX) counters, and asserts control signals, and all the CCR engines implemented in the FPGA run simultaneously. Unlike traditional designs, (MIN, MAX) counters contain dynamically updated lower and upper bounds of possible matching counts, instead of actual matching counts, so that feasible matching lengths are compactly enclosed in the counter value. The counter-based design can support constraint repetitions of n using O({rm log} n) memory bits rather than that of O(n) in existing solutions. MIN-MAX can resolve character class ambiguity between adjacent CCR terms and support overlapped matching when matching collisions are absent. We developed a set of heuristic rules to assess the absence of collision for CCR-based regexps, and tested them on Snort and SpamAssassin rule sets. The results show that the vast majority of rules are immune from collisions, so that MIN-MAX can cost effectively support overlapped matching. As a bonus, the new architecture also supports fast reconfiguration via ordinary memory writes rather than resynthesis of the entire design, which is critical for time-sensitive regexp deployment scenarios.
  • Keywords
    computational complexity; field programmable gate arrays; finite automata; parallel architectures; pattern matching; reconfigurable architectures; CCR engine; CCR-based regexps; FPGA; MIN-MAX; NFA-based algorithm; Snort; SpamAssassin rule sets; character classes with constraint repetitions; control signals; counter value; counter-based algorithm; counter-based design; massive parallel processing architectures; matching collisions; matching counts; memory bits; regular expression matching; time-sensitive regexp deployment scenarios; Algorithm design and analysis; Computer architecture; Doped fiber amplifiers; Engines; Field programmable gate arrays; Radiation detectors; Registers; Nondeterministic Finite Automata; algorithm design and analysis; reconfigurable hardware;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2012.116
  • Filename
    6178246