• DocumentCode
    2787148
  • Title

    Dynamic Fine-Grain Scheduling of Pipeline Parallelism

  • Author

    Sanchez, Daniel ; Lo, David ; Yoo, Richard M. ; Sugerman, Jeremy ; Kozyrakis, Christos

  • Author_Institution
    Pervasive Parallelism Lab., Stanford Univ., Stanford, CA, USA
  • fYear
    2011
  • fDate
    10-14 Oct. 2011
  • Firstpage
    22
  • Lastpage
    32
  • Abstract
    Scheduling pipeline-parallel programs, defined as a graph of stages that communicate explicitly through queues, is challenging. When the application is regular and the underlying architecture can guarantee predictable execution times, several techniques exist to compute highly optimized static schedules. However, these schedules do not admit run-time load balancing, so variability introduced by the application or the underlying hardware causes load imbalance, hindering performance. On the other hand, existing schemes for dynamic fine-grain load balancing (such as task-stealing) do not work well on pipeline-parallel programs: they cannot guarantee memory footprint bounds, and do not adequately schedule complex graphs or graphs with ordered queues. We present a scheduler implementation for pipeline-parallel programs that performs fine-grain dynamic load balancing efficiently. Specifically, we implement the first real runtime for GRAMPS, a recently proposed programming model that focuses on supporting irregular pipeline and data-parallel applications (in contrast to classical stream programming models and schedulers, which require programs to be regular). Task-stealing with per-stage queues and queuing policies, coupled with a backpressure mechanism, allow us to maintain strict footprint bounds, and a buffer management scheme based on packet-stealing allows low-overhead and locality-aware dynamic allocation of queue data. We evaluate our runtime on a multi-core SMP and find that it provides low-overhead scheduling of irregular workloads while maintaining locality. We also show that the GRAMPS scheduler outperforms several other commonly used scheduling approaches. Specifically, while a typical task-stealing scheduler performs on par with GRAMPS on simple graphs, it does significantly worse on complex ones, a canonical GPGPU scheduler cannot exploit pipeline parallelism and suffers from large memory footprints, and a typical static, streaming scheduler achieves somewha- better locality, but suffers significant load imbalance on a general-purpose multi-core due to fine-grain architecture variability (e.g., cache misses and SMT).
  • Keywords
    multiprocessing systems; parallel programming; queueing theory; resource allocation; scheduling; GRAMPS; backpressure mechanism; buffer management scheme; canonical GPGPU scheduler; complex graphs; dynamic fine-grain load balancing; dynamic fine-grain scheduling; fine-grain architecture variability; general-purpose multicore; multicore SMP; multicore chips; packet-stealing; per-stage queues; pipeline parallelism; pipeline-parallel program scheduling; predictable execution times; programming model; queue data locality-aware dynamic allocation; queuing policies; stream programming models; task-stealing scheduler; Dynamic scheduling; Instruction sets; Parallel processing; Programming; Runtime; Schedules;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on
  • Conference_Location
    Galveston, TX
  • ISSN
    1089-795X
  • Print_ISBN
    978-1-4577-1794-9
  • Type

    conf

  • DOI
    10.1109/PACT.2011.9
  • Filename
    6113785