DocumentCode
652364
Title
An Automatic Parallel-Stage Decoupled Software Pipelining Parallelization Algorithm Based on OpenMP
Author
Xiaoxian Liu ; Rongcai Zhao ; Lin Han ; Peng Liu
Author_Institution
State Key Lab. of Math. Eng., Adv. Comput., Zhengzhou, China
fYear
2013
fDate
16-18 July 2013
Firstpage
1825
Lastpage
1831
Abstract
While multicore processors increase throughput for multi-programmed and multithreaded codes, many important applications are single threaded and thus do not benefit. Automatic parallelization techniques play an important role in migrating singe threaded applications to multicore platforms. Unfortunately, the prevalence of control flow, recursive data structures, and general pointer accesses in ordinary programs renders the traditional automatic parallelization techniques unsuitable. Parallel-Stage Decoupled Software Pipelining (PS-DSWP) is proposed to exploit fine-grained pipeline parallelism lurking in ordinary programs with the existence of all kinds of dependences, including arbitrary control dependences, at the instruction level. But it requires knowledge of architectural properties and hardware support of a communication channel and two special instructions. We propose an improved PS-DSWP algorithm based on OpenMP in this paper. It is implemented without relying on CPU architectures by using a high level intermediate representation. Moreover, the Program Dependence Graph (PDG) used in the algorithm is built based on the basic blocks, which exploits coarser-grained parallelism than the original PS-DSWP transformation with PDG based on instructions. OpenMP is employed in our algorithm to assign task and implement synchronization among threads while avoiding dependence on hardware support. We evaluate the loops with complex memory patterns and control flow, which cannot be dealt with by traditional techniques, on multicore platform. As a result, they can be parallelized and gain significant performance improvement with our algorithm. We obtain a maximum speedup as high as 2.07x and on average 1.39x with 5 threads.
Keywords
application program interfaces; data structures; multi-threading; multiprocessing programs; multiprocessing systems; pipeline processing; synchronisation; CPU architectures; OpenMP; PDG; PS-DSWP algorithm; PS-DSWP transformation; architectural properties; automatic parallel-stage decoupled software pipelining parallelization algorithm; automatic parallelization techniques; coarser-grained parallelism; communication channel; complex memory patterns; fine-grained pipeline parallelism lurking; high level intermediate representation; multicore platforms; multicore processors; multiprogrammed codes; multithreaded codes; program dependence graph; recursive data structures; singe threaded applications; Instruction sets; Merging; Multicore processing; Pipeline processing; Synchronization; OpenMP; automatic parallelization; parallel-stage decoupled software pipelining;
fLanguage
English
Publisher
ieee
Conference_Titel
Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
Conference_Location
Melbourne, VIC
Type
conf
DOI
10.1109/TrustCom.2013.227
Filename
6681059
Link To Document