• DocumentCode
    3322743
  • Title

    Fully Distributed and Fault Tolerant Task Management Based on Diffusions

  • Author

    Bui, Alain ; Flauzac, Olivier ; Rabat, Cyril

  • Author_Institution
    Lab. PRiSM, Univ. de Versailles-St-Quentin-en-Yvelines, Versailles
  • fYear
    2009
  • fDate
    18-20 Feb. 2009
  • Firstpage
    355
  • Lastpage
    360
  • Abstract
    The task management is a critical component for the computational grids. The aim is to assign tasks on nodes according to a global scheduling policy and a view of local resources of nodes. A peer-to-peer approach for the task management involves a better scalability for the grid and higher fault tolerance. But some mechanisms have to be proposed to avoid the computation of replicated tasks that can reduce the efficiency and increase the load of nodes. In the same way, these mechanisms have to limit the number of exchanged messages to avoid the overload of the network.In previous work, we have proposed two methods for the task management called active and passive. These methods rebased on a random walk: they are fully distributed and fault tolerant. Each node owns a local tasks states set updated thanks to a random walk and each node is in charge of the local assignment. Here, we propose three methods to improve the efficiency of the active method. These new methods are based on a circulating word. The nodes local tasks states sets are updated thanks to periodical diffusions along trees built from the circulating word. Particularly, we show that these methods increase the efficiency of the active method: they produce less replicated tasks. These three methods are also fully distributed and fault tolerant. On the other way, the circulating word can be exploited for other applications like the resources management or the nodes synchronization.
  • Keywords
    fault tolerant computing; grid computing; peer-to-peer computing; random processes; scheduling; task analysis; trees (mathematics); active task management; computational grid; fault tolerant task management; fully distributed task management; global scheduling policy; passive task management; peer-to-peer approach; periodical tree diffusion; random walk; Computer network management; Computer networks; Concurrent computing; Distributed computing; Fault tolerance; Grid computing; Peer to peer computing; Processor scheduling; Resource management; Scalability; Computational Grid; Peer-to-peer; Random Walks; Task Management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel, Distributed and Network-based Processing, 2009 17th Euromicro International Conference on
  • Conference_Location
    Weimar
  • ISSN
    1066-6192
  • Print_ISBN
    978-0-7695-3544-9
  • Type

    conf

  • DOI
    10.1109/PDP.2009.51
  • Filename
    4912954