DocumentCode :
3575152
Title :
Online Performance Analysis: An Event-Based Workflow Design towards Exascale
Author :
Wagner, Michael ; Hilbrich, Tobias ; Brunst, Holger
Author_Institution :
Center for Inf. Services & High Performance Comput. (ZIH), Tech. Univ. Dresden, Dresden, Germany
fYear :
2014
Firstpage :
839
Lastpage :
846
Abstract :
Today, it is commonly accepted that speedup and efficiency are not granted automatically when developing or porting software for High Performance Computing (HPC) platforms. The reasons are manifold and actively investigated in the community. Software monitors are essential for these studies as they provide the raw performance data to be analyzed. We propose an online monitoring workflow for event-based performance analysis that takes into account the significant changes in system architecture towards the development of exascale supercomputers. Critical properties are: communication across large numbers of processing elements, limited I/O capabilities, and a decreasing memory-per-core ratio. We present a hierarchical data management and steering workflow that directly couples the application, measurement, and analysis processes, thus eliminating the need for extensive communication and buffering. The workflow is closely integrated with the native system communication API to enable best communication across processing elements. The memory issue is addressed with a new lossy hierarchical data compression technique for in-memory storage, intended for small, fixed-size buffers. Further, we abandon secondary storage to avoid potential I/O challenges. We demonstrate the feasibility of our design with a prototype implementation that features services for data collection, analysis, and runtime compression. Our evaluation extrapolates results obtained with the NAS Parallel Benchmarks at up to 2,048 processes to an exascale workflow.
Keywords :
application program interfaces; data analysis; data compression; parallel machines; software architecture; software performance evaluation; storage management; system monitoring; workflow management software; HPC; NAS parallel benchmarks; data analysis; data collection; event-based performance analysis; event-based workflow design; exascale supercomputers; hierarchical data compression technique; hierarchical data management; high performance computing platforms; in-memory storage; memory-per-core ratio; native system communication API; online monitoring workflow; online performance analysis; runtime compression; small fixed-size buffers; software monitors; software porting; steering workflow; system architecture; Instruments; Memory management; Monitoring; Performance analysis; Prototypes; Runtime; Software; GTI; OTFX; Online performance analysis; event tracing; online tracing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014 IEEE Intl Conf on
Print_ISBN :
978-1-4799-6122-1
Type :
conf
DOI :
10.1109/HPCC.2014.145
Filename :
7056843
Link To Document :
بازگشت