An Automatic Parallel-Stage Decoupled Software Pipelining Parallelization Algorithm Based on OpenMP

Author

Xiaoxian Liu ; Rongcai Zhao ; Lin Han ; Peng Liu

Author_Institution

State Key Lab. of Math. Eng., Adv. Comput., Zhengzhou, China

fYear

2013

fDate

16-18 July 2013

Firstpage

1825

Lastpage

1831

Abstract

While multicore processors increase throughput for multi-programmed and multithreaded codes, many important applications are single threaded and thus do not benefit. Automatic parallelization techniques play an important role in migrating singe threaded applications to multicore platforms. Unfortunately, the prevalence of control flow, recursive data structures, and general pointer accesses in ordinary programs renders the traditional automatic parallelization techniques unsuitable. Parallel-Stage Decoupled Software Pipelining (PS-DSWP) is proposed to exploit fine-grained pipeline parallelism lurking in ordinary programs with the existence of all kinds of dependences, including arbitrary control dependences, at the instruction level. But it requires knowledge of architectural properties and hardware support of a communication channel and two special instructions. We propose an improved PS-DSWP algorithm based on OpenMP in this paper. It is implemented without relying on CPU architectures by using a high level intermediate representation. Moreover, the Program Dependence Graph (PDG) used in the algorithm is built based on the basic blocks, which exploits coarser-grained parallelism than the original PS-DSWP transformation with PDG based on instructions. OpenMP is employed in our algorithm to assign task and implement synchronization among threads while avoiding dependence on hardware support. We evaluate the loops with complex memory patterns and control flow, which cannot be dealt with by traditional techniques, on multicore platform. As a result, they can be parallelized and gain significant performance improvement with our algorithm. We obtain a maximum speedup as high as 2.07x and on average 1.39x with 5 threads.

Keywords

application program interfaces; data structures; multi-threading; multiprocessing programs; multiprocessing systems; pipeline processing; synchronisation; CPU architectures; OpenMP; PDG; PS-DSWP algorithm; PS-DSWP transformation; architectural properties; automatic parallel-stage decoupled software pipelining parallelization algorithm; automatic parallelization techniques; coarser-grained parallelism; communication channel; complex memory patterns; fine-grained pipeline parallelism lurking; high level intermediate representation; multicore platforms; multicore processors; multiprogrammed codes; multithreaded codes; program dependence graph; recursive data structures; singe threaded applications; Instruction sets; Merging; Multicore processing; Pipeline processing; Synchronization; OpenMP; automatic parallelization; parallel-stage decoupled software pipelining;

fLanguage

English

Publisher

ieee

Conference_Titel

Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on

Conference_Location

Melbourne, VIC

Type

conf

DOI

10.1109/TrustCom.2013.227

Filename

6681059

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=652364