Title :
Speeding up pattern matching by optimal partial string extraction
Author :
Tan Jianlong ; Liu Xia ; Liu Yanbing ; Liu Ping
Author_Institution :
Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China
Abstract :
String matching plays a key role in web content monitoring systems. Suffix matching algorithms have good time efficiency, and thus are widely used. These algorithms require that all patterns in a set have the same length. When the patterns cannot satisfy this requirement, the leftmost characters, m being the length of the shortest pattern, are extracted to construct the data structure. We call such m-character strings partial strings. However, a simple extraction from the left does not address the impact of partial string locations on search speed. We propose a novel method to extract the partial strings from each pattern which maximizes search speed. More specifically, with this method we can compute all the corresponding searching time cost by theoretical derivation, and choose the location which yields an approximately minimal search time. We evaluate our method on two rule sets: Snort and ClamAV. Experiments show that in most cases, our method achieves the fastest searching speed in all possible locations of partial string extraction, and is about 5%-20% faster than the alternative methods.
Keywords :
Internet; feature extraction; string matching; ClamAV; Internet; Snort; Web content monitoring system; optimal partial string extraction; pattern matching; searching time cost; string matching; suffix matching algorithm; Algorithm design and analysis; Automata; Bills of materials; Data mining; Data structures; Pattern matching; Training; SBOM algorithm; Wu Manber algorithm; multiple string matching; pattern matching; string matching;
Conference_Titel :
Computer Communications Workshops (INFOCOM WKSHPS), 2011 IEEE Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4577-0249-5
Electronic_ISBN :
978-1-4577-0248-8
DOI :
10.1109/INFCOMW.2011.5928778