DocumentCode :
3322743
Title :
Fully Distributed and Fault Tolerant Task Management Based on Diffusions
Author :
Bui, Alain ; Flauzac, Olivier ; Rabat, Cyril
Author_Institution :
Lab. PRiSM, Univ. de Versailles-St-Quentin-en-Yvelines, Versailles
fYear :
2009
fDate :
18-20 Feb. 2009
Firstpage :
355
Lastpage :
360
Abstract :
The task management is a critical component for the computational grids. The aim is to assign tasks on nodes according to a global scheduling policy and a view of local resources of nodes. A peer-to-peer approach for the task management involves a better scalability for the grid and higher fault tolerance. But some mechanisms have to be proposed to avoid the computation of replicated tasks that can reduce the efficiency and increase the load of nodes. In the same way, these mechanisms have to limit the number of exchanged messages to avoid the overload of the network.In previous work, we have proposed two methods for the task management called active and passive. These methods rebased on a random walk: they are fully distributed and fault tolerant. Each node owns a local tasks states set updated thanks to a random walk and each node is in charge of the local assignment. Here, we propose three methods to improve the efficiency of the active method. These new methods are based on a circulating word. The nodes local tasks states sets are updated thanks to periodical diffusions along trees built from the circulating word. Particularly, we show that these methods increase the efficiency of the active method: they produce less replicated tasks. These three methods are also fully distributed and fault tolerant. On the other way, the circulating word can be exploited for other applications like the resources management or the nodes synchronization.
Keywords :
fault tolerant computing; grid computing; peer-to-peer computing; random processes; scheduling; task analysis; trees (mathematics); active task management; computational grid; fault tolerant task management; fully distributed task management; global scheduling policy; passive task management; peer-to-peer approach; periodical tree diffusion; random walk; Computer network management; Computer networks; Concurrent computing; Distributed computing; Fault tolerance; Grid computing; Peer to peer computing; Processor scheduling; Resource management; Scalability; Computational Grid; Peer-to-peer; Random Walks; Task Management;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel, Distributed and Network-based Processing, 2009 17th Euromicro International Conference on
Conference_Location :
Weimar
ISSN :
1066-6192
Print_ISBN :
978-0-7695-3544-9
Type :
conf
DOI :
10.1109/PDP.2009.51
Filename :
4912954
Link To Document :
بازگشت