• DocumentCode
    3146609
  • Title

    Fault-Tolerant Mechanism for Hierarchical Branch and Bound Algorithm

  • Author

    Bendjoudi, A. ; Melab, N. ; Talbi, E-G

  • Author_Institution
    Centre de Rech. sur l´´Inf. Sci. et Tech. CERIST, DTISI, Algiers, Algeria
  • fYear
    2011
  • fDate
    16-20 May 2011
  • Firstpage
    1806
  • Lastpage
    1814
  • Abstract
    Solving exactly large instances of Combinatorial Optimization Problems emph{(COPs)} using Branch and Bound emph{(B&B) }algorithms requires a huge amount of computing resources. These resources can be offered by computational grids and the scalability can be achieved using Hierarchical Master/Worker-based B&B pushing the limits of the traditional Master/Worker paradigm. However, the resources offered by grids are most of the time unreliable, volatile, and heterogeneous. Therefore, they must take into account fault tolerance. In this paper, we present FTH-B&B, a fault tolerant hierarchical B&B, in order to deal with the fault tolerance issue. It is composed of several fault tolerant Master/Worker-based sub-B&Bs organized hierarchically into groups and perform independently fault tolerant mechanism. Beside, a fault recovery mechanism is introduced to recover and avoid redundant exploration of sub-problems in case of failures. In addition, we propose a mechanism to maintain the hierarchy safe and balanced during the lifetime of the algorithm. Our algorithm is applied to the Flow-Shop scheduling problem emph{(FSP)} and implemented on top of the ProActive grid middleware. It has been promisingly experimented on the Grid´5000 French nation-wide grid and shows its ability to remain efficient even in presence of failures.
  • Keywords
    fault tolerant computing; flow shop scheduling; grid computing; middleware; system recovery; tree searching; French nation-wide grid; ProActive grid middleware; combinatorial optimization problems; computational grids; failures; fault recovery mechanism; fault-tolerant mechanism; flow-shop scheduling problem; hierarchical branch and bound algorithm; hierarchical master-worker; scalability; Algorithm design and analysis; Fault tolerance; Fault tolerant systems; Heart beat; Middleware; Scalability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
  • Conference_Location
    Shanghai
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-61284-425-1
  • Electronic_ISBN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2011.339
  • Filename
    6009049