DocumentCode
3322743
Title
Fully Distributed and Fault Tolerant Task Management Based on Diffusions
Author
Bui, Alain ; Flauzac, Olivier ; Rabat, Cyril
Author_Institution
Lab. PRiSM, Univ. de Versailles-St-Quentin-en-Yvelines, Versailles
fYear
2009
fDate
18-20 Feb. 2009
Firstpage
355
Lastpage
360
Abstract
The task management is a critical component for the computational grids. The aim is to assign tasks on nodes according to a global scheduling policy and a view of local resources of nodes. A peer-to-peer approach for the task management involves a better scalability for the grid and higher fault tolerance. But some mechanisms have to be proposed to avoid the computation of replicated tasks that can reduce the efficiency and increase the load of nodes. In the same way, these mechanisms have to limit the number of exchanged messages to avoid the overload of the network.In previous work, we have proposed two methods for the task management called active and passive. These methods rebased on a random walk: they are fully distributed and fault tolerant. Each node owns a local tasks states set updated thanks to a random walk and each node is in charge of the local assignment. Here, we propose three methods to improve the efficiency of the active method. These new methods are based on a circulating word. The nodes local tasks states sets are updated thanks to periodical diffusions along trees built from the circulating word. Particularly, we show that these methods increase the efficiency of the active method: they produce less replicated tasks. These three methods are also fully distributed and fault tolerant. On the other way, the circulating word can be exploited for other applications like the resources management or the nodes synchronization.
Keywords
fault tolerant computing; grid computing; peer-to-peer computing; random processes; scheduling; task analysis; trees (mathematics); active task management; computational grid; fault tolerant task management; fully distributed task management; global scheduling policy; passive task management; peer-to-peer approach; periodical tree diffusion; random walk; Computer network management; Computer networks; Concurrent computing; Distributed computing; Fault tolerance; Grid computing; Peer to peer computing; Processor scheduling; Resource management; Scalability; Computational Grid; Peer-to-peer; Random Walks; Task Management;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel, Distributed and Network-based Processing, 2009 17th Euromicro International Conference on
Conference_Location
Weimar
ISSN
1066-6192
Print_ISBN
978-0-7695-3544-9
Type
conf
DOI
10.1109/PDP.2009.51
Filename
4912954
Link To Document