• DocumentCode
    1897589
  • Title

    Hardware-accelerated regular expression matching for high-throughput text analytics

  • Author

    Atasu, Kubilay ; Polig, Raphael ; Hagleitner, Christoph ; Reiss, Frederick R.

  • Author_Institution
    IBM Res., Zurich, Switzerland
  • fYear
    2013
  • fDate
    2-4 Sept. 2013
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    Advanced text analytics systems combine regular expression (regex) matching, dictionary processing, and relational algebra for efficient information extraction from text documents. Such systems require support for advanced regex matching features, such as start offset reporting and capturing groups. However, existing regex matching architectures based on reconfigurable nondeterministic state machines and programmable deterministic state machines are not designed to support such features. We describe a novel architecture that supports such advanced features using a network of state machines. We also present a compiler that maps the regexs onto such networks that can be efficiently realized on reconfigurable logic. For each regex, our compiler produces a state machine description, statically computes the number of state machines needed, and produces an optimized interconnection network. Experiments on an Altera Stratix IV FPGA, using regexs from a real life text analytics benchmark, show that a throughput rate of 16 Gb/s can be reached.
  • Keywords
    field programmable gate arrays; finite state machines; knowledge acquisition; pattern matching; relational algebra; text analysis; Altera Stratix IV FPGA; bit rate 16 Gbit/s; capturing groups; compiler; dictionary processing; hardware-accelerated regular expression matching; high-throughput text analytics; information extraction; optimized interconnection network; programmable deterministic state machines; reconfigurable logic; reconfigurable nondeterministic state machines; regex matching architectures; relational algebra; start offset reporting; text documents; Delays; Dictionaries; Doped fiber amplifiers; Multiprocessor interconnection; Registers; Semantics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Field Programmable Logic and Applications (FPL), 2013 23rd International Conference on
  • Conference_Location
    Porto
  • Type

    conf

  • DOI
    10.1109/FPL.2013.6645534
  • Filename
    6645534