• DocumentCode
    679652
  • Title

    DistCL: A Framework for the Distributed Execution of OpenCL Kernels

  • Author

    Diop, Tahir ; Gurfinkel, Steven ; Anderson, Jon ; Jerger, Natalie Enright

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Toronto, Toronto, ON, Canada
  • fYear
    2013
  • fDate
    14-16 Aug. 2013
  • Firstpage
    556
  • Lastpage
    566
  • Abstract
    GPUs are used to speed up many scientific computations, however, to use several networked GPUs concurrently, the programmer must explicitly partition work and transmit data between devices. We propose DistCL, a novel framework that distributes the execution of penCL kernels across a GPU cluster. DistCL makes multiple distributed compute devices appear to be a single compute device. DistCL abstracts and manages many of the challenges associated with distributing a kernel across multiple devices including: (1) partitioning work into smaller parts, (2) scheduling these parts across the network, (3) partitioning memory so that each part of memory is written to by at most one device, and (4) tracking and transferring these parts of memory. Converting an OpenCL application to DistCL is straightforward and requires little programmer effort. This makes it a powerful and valuable tool for exploring the distributed execution of OpenCL kernels. We compare DistCL to SnuCL, which also facilitates the distribution of OpenCL kernels. We also give some insights: distributed tasks favor more compute bound problems and favour large contiguous memory accesses. DistCL achieves a maximum speedup of 29.1 and average speedups of 7.3 when distributing kernels among 32 peers over an Infiniband cluster.
  • Keywords
    graphics processing units; parallel processing; scheduling; DistCL; GPU cluster; Infiniband cluster; OpenCL kernel distributed execution; SnuCL; compute bound problems; contiguous memory access; data-parallel task; distributed tasks; memory partitioning; memory tracking; memory transferring; multiple distributed compute devices; part scheduling; work partitioning; Benchmark testing; Distributed databases; Graphics processing units; Kernel; Memory management; Peer-to-peer computing; Vectors; Distributed Computing; GPU; Infiniband; OpenCL;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 2013 IEEE 21st International Symposium on
  • Conference_Location
    San Francisco, CA
  • ISSN
    1526-7539
  • Type

    conf

  • DOI
    10.1109/MASCOTS.2013.77
  • Filename
    6730812