DocumentCode :
167210
Title :
HCW 2014 Keynote Talk
Author :
Abramson, David
Author_Institution :
Univ. of Queensland, Brisbane, QLD, Australia
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
6
Lastpage :
6
Abstract :
Summary form only given. CCDB, implements a strategy called "Comparative Debugging", which helps trace software errors by comparing two executions of a program at the same time - one code being a reference version and the other faulty. Specifically, users write "assertions" that detect when data structure contents in the two executions diverge, and using the dataflow of the code it is possible to locate the source of the divergence. Comparative debugging is effective at finding errors when code is migrated from one platform to another, and this is of significant interest for hybrid computer architectures containing CPUs and accelerators. In this talk I will discuss the design and implementation of CCDB, and show that it operates on highly parallel hybrid CPU/GPU systems. CCDB provides a uniform comparison interface that allows programmers to examine the global runtime status across different types of hybrid programs, including OpenACC and UPC programs. I will present a case study in finding errors using the hybrid version of the stellarator particle simulation DELTA5D, on the Titan machine at ORNL. I will also illustrate that the debugger scales well, and is effective with up to 10,000 nodes and 5,000 GPUs.
Keywords :
computer architecture; configuration management; data flow computing; data structures; graphics processing units; parallel programming; program debugging; program diagnostics; CCDB; CPUs; DELTA5D; GPUs; ORNL; OpenACC program; Titan machine; UPC program; accelerators; comparative debugging; comparison interface; data structure contents; dataflow; global runtime status; hybrid computer architectures; hybrid programs; hybrid version; parallel heterogeneous program highly; parallel hybrid CPU/GPU systems; reference version; software error tracing; stellarator particle simulation; Computer architecture; Conferences; Debugging; Distributed processing; Educational institutions; High performance computing; Information technology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-4117-9
Type :
conf
DOI :
10.1109/IPDPSW.2014.207
Filename :
6969365
Link To Document :
بازگشت