• DocumentCode
    2335220
  • Title

    CompactDFA: Generic State Machine Compression for Scalable Pattern Matching

  • Author

    Bremler-Barr, Anat ; Hay, David ; Koral, Yaron

  • Author_Institution
    Comput. Sci. Dept., Interdiscipl. Center, Herzliya, Israel
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    1
  • Lastpage
    9
  • Abstract
    Pattern matching algorithms lie at the core of all contemporary Intrusion Detection Systems (IDS), making it intrinsic to reduce their speed and memory requirements. This paper focuses on the most popular class of pattern-matching algorithms, the Aho-Corasick-like algorithms, which are based on constructing and traversing a Deterministic Finite Automaton (DFA), representing the patterns. While this approach ensures deterministic time guarantees, modern IDSs need to deal with hundreds of patterns, thus requiring to store very large DFAs which usually do not fit in fast memory. This results in a major bottleneck on the throughput of the IDS, as well as its power consumption and cost. We propose a novel method to compress DFAs by observing that the name of the states is meaningless. While regular DFAs store separately each transition between two states, we use this degree of freedom and encode states in such a way that all transitions to a specific state can be represented by a single prefix that defines a set of current states. Our technique applies to a large class of automata, which can be categorized by simple properties. Then, the problem of pattern matching is reduced to the well-studied problem of Longest Prefix Matching (LPM) that can be solved either in TCAM, in commercially available IP-lookup chips, or in software. Specifically, we show that with a TCAM our scheme can reach a throughput of 10 Gbps with low power consumption.
  • Keywords
    data compression; deterministic automata; finite state machines; pattern matching; security of data; Aho-Corasick-like algorithms; IP-lookup chips; TCAM; bit rate 10 Gbit/s; compactDFA; deterministic finite automaton; generic state machine compression; intrusion detection systems; longest prefix matching; low power consumption; power consumption; scalable pattern matching algorithms; Automata; Communications Society; Computer science; Doped fiber amplifiers; Energy consumption; Hardware; Intrusion detection; Pattern matching; Throughput; USA Councils;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    INFOCOM, 2010 Proceedings IEEE
  • Conference_Location
    San Diego, CA
  • ISSN
    0743-166X
  • Print_ISBN
    978-1-4244-5836-3
  • Type

    conf

  • DOI
    10.1109/INFCOM.2010.5462160
  • Filename
    5462160