DocumentCode :
2953488
Title :
Applications resilience on clouds
Author :
Nguyên, Toàn ; Désidéri, Jean-Antoine ; Trifan, Laurentiu
Author_Institution :
Project OPALE, INRIA, St. Ismier, France
fYear :
2012
fDate :
2-6 July 2012
Firstpage :
60
Lastpage :
66
Abstract :
Cloud computing infrastructures support system and network fault-tolerance. They transparently repair and prevent communication and software errors. They also allow duplication and migration of jobs and data to prevent hardware failures. However, only limited work has been done so far on application resilience, i.e., the ability to resume normal execution after errors and abnormal executions in distributed environments and clouds. This paper addresses open issues and solutions for application errors detection and management. It also overviews a testbed used to to design, deploy, execute, monitor, restart and resume distributed applications on cloud infrastructures in cases of failures.
Keywords :
cloud computing; software fault tolerance; abnormal executions; application errors detection; application errors management; applications resilience; cloud computing infrastructures; communication errors; distributed environments; hardware failures; network fault tolerance; software errors; Checkpointing; Fault tolerance; Fault tolerant systems; Hardware; Resilience; Software; Transient analysis; Cloud Computing; High-Performance Computing; Resilience; Scientific Applications; Workflows;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Simulation (HPCS), 2012 International Conference on
Conference_Location :
Madrid
Print_ISBN :
978-1-4673-2359-8
Type :
conf
DOI :
10.1109/HPCSim.2012.6266891
Filename :
6266891
Link To Document :
بازگشت