DocumentCode :
246363
Title :
High Performance Parallelization of Boyer-Moore Algorithm on Many-Core Accelerators
Author :
Yosang Jeong ; Myungho Lee ; Dukyun Nam ; Jik-Soo Kim ; Soonwook Hwang
Author_Institution :
Dept. of Comput. Sci. & Eng., Myongji Univ., Yong In, South Korea
fYear :
2014
fDate :
8-12 Sept. 2014
Firstpage :
265
Lastpage :
272
Abstract :
Boyer-Moore (BM) algorithm is a single pattern string matching algorithm. It is considered as the most efficient string matching algorithm and used in many applications. The algorithm first calculates two string shift rules based on the given pattern string in the preprocessing phase. These rules help skip parts of the target input string where there is no match to be found. Using the two shift rules, pattern matching operations are performed against the target input sting in the second phase. The second phase is a time consuming process and needs to be parallelized to achieve the high performance string matching. In this paper, we parallelize the BM algorithm on the latest many core accelerators such as the Intel Xeon Phi and the Nvidia Tesla K20 GPU, along with the general-purpose multi-core processors. We partition the target input data amongst multiple threads for parallel execution. Data lying on the threads´ boundaries need to be copied redundantly so that the pattern string lying on the boundary can be found. As the target length increases, the algorithm incurs increased matching operations. Also, as the pattern length increases, the number of possible matches decreases. This can potentially lead to the unbalanced workload distribution among threads. Furthermore, the redundant data copy significantly overloads the on-chip shared memories of the GPU for a large number of threads. We use the dynamic scheduling and the multithreading techniques to solve the load balancing problem. We also use the algorithmic cascading technique to reduce the burden on the shared memories of the GPU. Our parallel implementation leads to ~17-times speedup on the Xeon Phi and ~45-times speedup on the Nvidia Tesla K20GPU compared with a serial implementation on the host Intel Xeon processor.
Keywords :
graphics processing units; microprocessor chips; multi-threading; multiprocessing systems; parallel processing; processor scheduling; string matching; Boyer-Moore algorithm; Intel Xeon Phi; Nvidia Tesla K20 GPU; algorithmic cascading technique; dynamic scheduling; high performance parallelization; load balancing problem; many-core accelerator; multithreading technique; pattern string matching algorithm; string shift rules; Algorithm design and analysis; Graphics processing units; Heuristic algorithms; Instruction sets; Multicore processing; Partitioning algorithms; Pattern matching; Boyer-Moore algorithm; algorithmic cascading; dynamic scheduling; many-core accelerator; multithreading; parallelization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud and Autonomic Computing (ICCAC), 2014 International Conference on
Conference_Location :
London
Type :
conf
DOI :
10.1109/ICCAC.2014.20
Filename :
7024070
Link To Document :
بازگشت