Title :
Tara: An algorithm for fast searching of multiple patterns on text files
Author :
M. Oguzhan Kulekci
Author_Institution :
Turkish Army Gendarme Headquarter, Be?tepe, Ankara, 41470 Turkey
Abstract :
This work introduces a new multi-pattern matching algorithm that performs searching of fixed-length strings on text files very fast by benefiting from bit-parallelism. The algorithm is given name tara. Bounded gaps as well as character classes in keywords are also supported. Although the worst case time complexity is quadratic, it performs very fast in practise. Experiments are conducted to compare the performance of the proposed algorithm with widely used GNU grep file search utility and also with 9 variants of Aho&Corasick and Comentz&Walter algorithms on natural language text. On the average tara is approximately 10% faster then grep, where up to 70% percent speed up is observed. The benchmark with the AC and CW variants results that the speed up obtained by tara is 3,5 times relative to its nearest successor.
Keywords :
"Automata","Pattern matching","AC machines","Natural languages","Performance analysis","Taxonomy","Parallel processing","Digital arithmetic","Algorithm design and analysis","Utility programs"
Conference_Titel :
Computer and information sciences, 2007. iscis 2007. 22nd international symposium on
Print_ISBN :
978-1-4244-1363-8
DOI :
10.1109/ISCIS.2007.4456850