• DocumentCode
    652364
  • Title

    An Automatic Parallel-Stage Decoupled Software Pipelining Parallelization Algorithm Based on OpenMP

  • Author

    Xiaoxian Liu ; Rongcai Zhao ; Lin Han ; Peng Liu

  • Author_Institution
    State Key Lab. of Math. Eng., Adv. Comput., Zhengzhou, China
  • fYear
    2013
  • fDate
    16-18 July 2013
  • Firstpage
    1825
  • Lastpage
    1831
  • Abstract
    While multicore processors increase throughput for multi-programmed and multithreaded codes, many important applications are single threaded and thus do not benefit. Automatic parallelization techniques play an important role in migrating singe threaded applications to multicore platforms. Unfortunately, the prevalence of control flow, recursive data structures, and general pointer accesses in ordinary programs renders the traditional automatic parallelization techniques unsuitable. Parallel-Stage Decoupled Software Pipelining (PS-DSWP) is proposed to exploit fine-grained pipeline parallelism lurking in ordinary programs with the existence of all kinds of dependences, including arbitrary control dependences, at the instruction level. But it requires knowledge of architectural properties and hardware support of a communication channel and two special instructions. We propose an improved PS-DSWP algorithm based on OpenMP in this paper. It is implemented without relying on CPU architectures by using a high level intermediate representation. Moreover, the Program Dependence Graph (PDG) used in the algorithm is built based on the basic blocks, which exploits coarser-grained parallelism than the original PS-DSWP transformation with PDG based on instructions. OpenMP is employed in our algorithm to assign task and implement synchronization among threads while avoiding dependence on hardware support. We evaluate the loops with complex memory patterns and control flow, which cannot be dealt with by traditional techniques, on multicore platform. As a result, they can be parallelized and gain significant performance improvement with our algorithm. We obtain a maximum speedup as high as 2.07x and on average 1.39x with 5 threads.
  • Keywords
    application program interfaces; data structures; multi-threading; multiprocessing programs; multiprocessing systems; pipeline processing; synchronisation; CPU architectures; OpenMP; PDG; PS-DSWP algorithm; PS-DSWP transformation; architectural properties; automatic parallel-stage decoupled software pipelining parallelization algorithm; automatic parallelization techniques; coarser-grained parallelism; communication channel; complex memory patterns; fine-grained pipeline parallelism lurking; high level intermediate representation; multicore platforms; multicore processors; multiprogrammed codes; multithreaded codes; program dependence graph; recursive data structures; singe threaded applications; Instruction sets; Merging; Multicore processing; Pipeline processing; Synchronization; OpenMP; automatic parallelization; parallel-stage decoupled software pipelining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
  • Conference_Location
    Melbourne, VIC
  • Type

    conf

  • DOI
    10.1109/TrustCom.2013.227
  • Filename
    6681059