Title :
Extendable framework for monitoring heterogeneous multi-accelerator HPC cluster
Author :
Deepika, H.V. ; Mangala, N. ; Babu, N. Sarat Chandra
Author_Institution :
Hybrid Comput. Group, Centre for Dev. of Adv. Comput., Bangalore, India
Abstract :
The superior performance:power ratio of accelerators is motivating new cluster architectures with varied accelerator combinations. Monitoring ensures normal functioning of the cluster by detecting service degradations and prompt rectification. This paper describes a modular and extendable monitoring framework for heterogeneous multi-accelerator clusters which will be useful for future HPC systems. The framework can support third party software plugins to provide different functional features. A monitoring tool has been developed on the basis of this framework to monitor CPU, GPGPU and FPGA accelerators, network, storage, user jobs and other relevant services of a heterogeneous cluster; the tool is also capable of auto rectification to a certain extent.
Keywords :
field programmable gate arrays; graphics processing units; parallel processing; CPU accelerators; FPGA accelerators; GPGPU accelerators; HPC systems; accelerator combinations; accelerators power ratio; central processing unit; cluster monitoring; extendable monitoring framework; field programmable gate arrays; general-purpose graphics processing unit; heterogeneous multiaccelerator HPC cluster; high performance computing; service degradations; third party software plugins; Computer architecture; Databases; Field programmable gate arrays; Graphics processing units; Monitoring; Probes; Servers; FPGA; GPU; Many core; accelerator; cluster; heterogeneous; monitoring; plugin;
Conference_Titel :
Computing for Sustainable Global Development (INDIACom), 2014 International Conference on
Conference_Location :
New Delhi
Print_ISBN :
978-93-80544-10-6
DOI :
10.1109/IndiaCom.2014.6828136