DocumentCode
167210
Title
HCW 2014 Keynote Talk
Author
Abramson, David
Author_Institution
Univ. of Queensland, Brisbane, QLD, Australia
fYear
2014
fDate
19-23 May 2014
Firstpage
6
Lastpage
6
Abstract
Summary form only given. CCDB, implements a strategy called "Comparative Debugging", which helps trace software errors by comparing two executions of a program at the same time - one code being a reference version and the other faulty. Specifically, users write "assertions" that detect when data structure contents in the two executions diverge, and using the dataflow of the code it is possible to locate the source of the divergence. Comparative debugging is effective at finding errors when code is migrated from one platform to another, and this is of significant interest for hybrid computer architectures containing CPUs and accelerators. In this talk I will discuss the design and implementation of CCDB, and show that it operates on highly parallel hybrid CPU/GPU systems. CCDB provides a uniform comparison interface that allows programmers to examine the global runtime status across different types of hybrid programs, including OpenACC and UPC programs. I will present a case study in finding errors using the hybrid version of the stellarator particle simulation DELTA5D, on the Titan machine at ORNL. I will also illustrate that the debugger scales well, and is effective with up to 10,000 nodes and 5,000 GPUs.
Keywords
computer architecture; configuration management; data flow computing; data structures; graphics processing units; parallel programming; program debugging; program diagnostics; CCDB; CPUs; DELTA5D; GPUs; ORNL; OpenACC program; Titan machine; UPC program; accelerators; comparative debugging; comparison interface; data structure contents; dataflow; global runtime status; hybrid computer architectures; hybrid programs; hybrid version; parallel heterogeneous program highly; parallel hybrid CPU/GPU systems; reference version; software error tracing; stellarator particle simulation; Computer architecture; Conferences; Debugging; Distributed processing; Educational institutions; High performance computing; Information technology;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location
Phoenix, AZ
Print_ISBN
978-1-4799-4117-9
Type
conf
DOI
10.1109/IPDPSW.2014.207
Filename
6969365
Link To Document