مرکز منطقه ای اطلاع رساني علوم و فناوري - High Performance Computational Grids Fault Tolerance at System Level

DocumentCode :

2313262

Title :

High Performance Computational Grids Fault Tolerance at System Level

Author :

Mujumdar, Manik ; Bheevgade, Meenakshi ; Malik, Latesh ; Patrikar, Rajendra

Author_Institution :

G.H. Raisoni Coll. of Eng., Nagpur

fYear :

2008

fDate :

16-18 July 2008

Firstpage :

379

Lastpage :

383

Abstract :

Many complex scientific, mathematical applications require large time for completion. To deal with this issue, parallelization is popularly used. Distributing an application onto several machines is one of the key aspects of grid-computing. This paper focuses on a check point/restart mechanism used to overcome the problem of job suspension at a failed node in a computational Grid. The ability to checkpoint a running application and restart it later can provide many useful benefits including fault recovery by rolling back an application to a previous checkpoint, advanced resources sharing, better application response time by restarting applications from checkpoints instead of from scratch, and improved system utilization, efficient high performance computing and improved service availability.

Keywords :

fault tolerant computing; grid computing; check point-restart mechanism; fault recovery; fault tolerance; grid computing; high performance computational grids; job suspension; parallelization; Access protocols; Application software; Concurrent computing; Distributed computing; Fault tolerant systems; Grid computing; High energy physics instrumentation computing; High performance computing; Pervasive computing; Resource management; Checkpoint/Restart; Cluster; Computational Grid; Fault Tolerance; High performance Computing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Emerging Trends in Engineering and Technology, 2008. ICETET '08. First International Conference on

Conference_Location :

Nagpur, Maharashtra

Print_ISBN :

978-0-7695-3267-7

Electronic_ISBN :

978-0-7695-3267-7

Type :

conf

DOI :

10.1109/ICETET.2008.21

Filename :

4579928

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2313262