DocumentCode :
2915470
Title :
Extending an SSI Cluster for Resource Discovery in Grid Computing
Author :
Echaiz, Javier ; Ardenghi, Jorge
Author_Institution :
Departamento de Ciencias e Ingenieria de la Computacion, Univ. Nacional del Sur, Bahia Blanca
fYear :
2006
fDate :
Oct. 2006
Firstpage :
287
Lastpage :
293
Abstract :
Grid technologies enable large-scale sharing of resources within formal or informal consortia of individuals and/or virtual organizations. In these settings, the discovery, characterization, and monitoring of resources, services, and computations can be challenging due to the considerable diversity, large numbers, dynamic behavior, and geographical distribution of the entities in which a user might be interested. Hence, information services are a vital part of any grid software infrastructure, providing fundamental mechanisms for discovery and monitoring, and thus for planning and adapting application behavior. This paper proposes a resource discovery system for grid computing with fault-tolerant capabilities starting from an SSI clustering operating system. The proposed system uses dynamic leader-determination and registration mechanisms to automatically recover from nodes and network failures. The system is centralized and uses dynamic (or soft-state) registration to detect and recover from failures. Provisional or backup leader determination provides tolerance and recovery in the event of the leader node failing. The system was tested against a control network modeled after existing grid computing resource discovery components, such as Globus monitoring and discovery system (MDS). In various failure scenarios, the proposed system showed better resilience and performance than the control system
Keywords :
fault tolerant computing; grid computing; information services; operating systems (computers); Globus monitoring and discovery system; SSI clustering operating system; dynamic leader-determination; fault tolerance; grid computing; grid software infrastructure; high performance computing; information service; large-scale resource sharing; registration mechanism; resource discovery; Application software; Automatic control; Condition monitoring; Distributed computing; Fault tolerant systems; Grid computing; Large-scale systems; Operating systems; Resilience; System testing; fault tolerance; grid operating systems; high performance computing.; resource discovery;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Grid and Cooperative Computing, 2006. GCC 2006. Fifth International Conference
Conference_Location :
Hunan
Print_ISBN :
0-7695-2694-2
Type :
conf
DOI :
10.1109/GCC.2006.43
Filename :
4031470
Link To Document :
بازگشت