DocumentCode
169128
Title
SLURM Support for Remote GPU Virtualization: Implementation and Performance Study
Author
Iserte, Sergio ; Castello, Adrian ; Mayo, Rafael ; Quintana-Orti, Enrique S. ; Silla, Federico ; Duato, Jose ; Reano, Carlos ; Prades, Javier
Author_Institution
Univ. Jaume I de Castello, Castello de la Plana, Spain
fYear
2014
fDate
22-24 Oct. 2014
Firstpage
318
Lastpage
325
Abstract
SLURM is a resource manager that can be leveraged to share a collection of heterogeneous resources among the jobs in execution in a cluster. However, SLURM is not designed to handle resources such as graphics processing units (GPUs). Concretely, although SLURM can use a generic resource plugin (GRes) to manage GPUs, with this solution the hardware accelerators can only be accessed by the job that is in execution on the node to which the GPU is attached. This is a serious constraint for remote GPU virtualization technologies, which aim at providing a user-transparent access to all GPUs in cluster, independently of the specific location of the node where the application is running with respect to the GPU node. In this work we introduce a new type of device in SLURM, "rgpu", in order to gain access from any application node to any GPU node in the cluster using rCUDA as the remote GPU virtualization solution. With this new scheduling mechanism, a user can access any number of GPUs, as SLURM schedules the tasks taking into account all the graphics accelerators available in the complete cluster. We present experimental results that show the benefits of this new approach in terms of increased flexibility for the job scheduler.
Keywords
graphics processing units; parallel architectures; resource allocation; virtualisation; SLURM; generic resource plugin; graphics accelerator; graphics processing unit; hardware accelerator; rCUDA; remote GPU virtualization; resource manager; scheduling mechanism; user-transparent access; Acceleration; Computer architecture; Graphics processing units; Middleware; Resource management; Throughput; Virtualization; HPC cluster; job scheduler; remote GPU virtualization; resource management;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium on
Conference_Location
Jussieu
ISSN
1550-6533
Type
conf
DOI
10.1109/SBAC-PAD.2014.49
Filename
6970680
Link To Document