DocumentCode
1434390
Title
Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery
Author
Guo, Suchang ; Huang, Hong-Zhong ; Wang, Zhonglai ; Xie, Min
Author_Institution
Sch. of Mech., Electron., & Ind. Eng., Univ. of Electron. Sci. & Technol. of China, Chengdu, China
Volume
60
Issue
1
fYear
2011
fDate
3/1/2011 12:00:00 AM
Firstpage
263
Lastpage
274
Abstract
There has been quite some research on the development of tools and techniques for grid systems, yet some important issues, e.g., grid service reliability and task scheduling in the grid, have not been sufficiently studied. For some grid services which have large subtasks requiring time-consuming computation, the reliability of grid service could be rather low. To resolve this problem, this paper introduces Local Node Fault Recovery (LNFR) mechanism into grid systems, and presents an in-depth study on grid service reliability modeling and analysis with this kind of fault recovery. To make LNFR mechanism practical, some constraints, i.e. the life times of subtasks, and the numbers of recoveries performed in grid nodes, are introduced; and grid service reliability models under these practical constraints are developed. Based on the proposed grid service reliability model, a multi-objective task scheduling optimization model is presented, and an ant colony optimization (ACO) algorithm is developed to solve it effectively. A numerical example is given to illustrate the influence of fault recovery on grid service reliability, and show a high efficiency of ACO in solving the grid task scheduling problem.
Keywords
fault tolerance; grid computing; optimisation; scheduling; software reliability; ACO efficiency; LNFR mechanism; ant colony optimization; grid node; grid service reliability modeling; grid system; local node fault recovery mechanism; multiobjective task scheduling optimization; optimal task scheduling; time consuming computation; Ant colony optimization; fault recovery; grid service reliability; recoverability; task scheduling;
fLanguage
English
Journal_Title
Reliability, IEEE Transactions on
Publisher
ieee
ISSN
0018-9529
Type
jour
DOI
10.1109/TR.2010.2104190
Filename
5699967
Link To Document