DocumentCode :
85808
Title :
Systematic Debugging Methods for Large-Scale HPC Computational Frameworks
Author :
Humphrey, Alan ; Qingyu Meng ; Berzins, Martin ; Caminha B. de Oliveira, Diego ; Rakamaric, Zvonimir ; Gopalakrishnan, Ganesh
Author_Institution :
Univ. of Utah, Salt Lake City, UT, USA
Volume :
16
Issue :
3
fYear :
2014
fDate :
May-June 2014
Firstpage :
48
Lastpage :
56
Abstract :
Parallel computational frameworks for high-performance computing are central to the advancement of simulation-based studies in science and engineering. Unfortunately, finding and fixing bugs in these frameworks can be extremely time consuming. Left unchecked, these bugs can drastically diminish the amount of new science that can be performed. This article presents a systematic study of the Uintah Computational Framework and approaches to debug it more incisively. A key insight is to leverage the modular structure of Uintah, which lends itself to systematic debugging. In particular, the authors have developed a new approach based on coalesced stack trace graphs (CSTG) that summarize the system behavior in terms of key control flows manifested through function invocation chains. They illustrate several scenarios for how CSTGs could help efficiently localize bugs, and present a case study of how they found and fixed a real Uintah bug using CSTGs.
Keywords :
graph theory; parallel programming; program debugging; CSTG; Uintah bug; Uintah computational framework; bugs fixing; bugs localization; coalesced stack trace graphs; engineering; function invocation chains; high-performance computing; large-scale HPC computational frameworks; modular structure; parallel computational frameworks; science; simulation-based studies; system behavior; systematic debugging methods; Computational modeling; Computer bugs; Debugging; Runtime; Scientific computing; Software development; Systematics; computational modeling and frameworks; debugging aids; parallel programming; reliability; scientific computing;
fLanguage :
English
Journal_Title :
Computing in Science & Engineering
Publisher :
ieee
ISSN :
1521-9615
Type :
jour
DOI :
10.1109/MCSE.2014.11
Filename :
6729885
Link To Document :
بازگشت