Title :
Reliability-aware scalability models for high performance computing
Author :
Zheng, Ziming ; Lan, Zhiling
Author_Institution :
Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA
fDate :
Aug. 31 2009-Sept. 4 2009
Abstract :
Scalability models are powerful analytical tools for evaluating and predicting the performance of parallel applications. Unfortunately, existing scalability models do not quantify failure impact and therefore cannot accurately account for application performance in the presence of failures. In this study, we extend two well-known models, namely Amdahl´s law and Gustafson´s law, by considering the impact of failures and the effect of fault tolerance techniques on applications. The derived reliability-aware models can be used to predict application scalability in failure-present environments and evaluate fault tolerance techniques. Trace-based simulations via real failure logs demonstrate that the newly developed models provide a better understanding of application performance and scalability in the presence of failures.
Keywords :
parallel processing; software performance evaluation; software reliability; Amdahl law; Gustafson law; high performance computing; parallel applications; performance evaluation; reliability-aware scalability; Analytical models; Application software; Checkpointing; Computer science; Fault tolerance; High performance computing; Large-scale systems; Performance analysis; Predictive models; Scalability;
Conference_Titel :
Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4244-5011-4
Electronic_ISBN :
1552-5244
DOI :
10.1109/CLUSTR.2009.5289177