• DocumentCode
    2093396
  • Title

    A Fault-Tolerant Workflow Mapping Algorithm under End-to-End Delay Constraint

  • Author

    Cao, Fei ; Zhu, Mengxia

  • Author_Institution
    Comput. Sci. Dept., Southern Illinois Univ., Carbondale, IL, USA
  • fYear
    2011
  • fDate
    2-4 Sept. 2011
  • Firstpage
    575
  • Lastpage
    580
  • Abstract
    In recent years, distributed scientific workflows structured as directed acyclic graph (DAG) are widely applied to various research areas to enable science discovery. In a heterogeneous distributed computing environment, computing nodes and communication link failures are inevitable and could have an negative impact on the applications. Efficient mapping and scheduling algorithms to achieve both the low end-to-end delay and the high reliability have been attracting active research interests. However, it is not possible to optimize both objectives due to conflicting requirements. We proposed a mapping and scheduling algorithm, namely Fault-tolerant Workflow Mapping Algorithm under End-to-end Delay Constraint (FMEDC), which considers both the execution delay and reliability in a two-step procedure. In the first step, a hybrid mapping algorithm combining recursive critical path searching and layer-based priority techniques (HCPT) is developed to achieve the minimum end-to-end delay(MED) along the critical path. The computation and communication failure rates are factored into equation to compute the reliability-enhanced execution time. In the second step, the modules on the non-critical paths are rescheduled onto nodes in a way such that system reliability can be improved under the MED constraint calculated in the first step. Experimental results show that our algorithm achieves the lowest MED but highest reliability under the same time constraints compared with some representative algorithms.
  • Keywords
    directed graphs; distributed processing; scheduling; software fault tolerance; communication link failure; computing nodes failure; directed acyclic graph; distributed scientific workflow; end-to-end delay constraint; fault-tolerant workflow mapping algorithm; heterogeneous distributed computing environment; hybrid mapping algorithm; layer-based priority technique; minimum end-to-end delay; recursive critical path searching; scheduling algorithm; science discovery; system reliability; Computer network reliability; Cost function; Delay; Distributed computing; Heuristic algorithms; Processor scheduling; Reliability; directed acyclic graph; distributed computing; end-to-end delay; mapping and scheduling; reliability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communications (HPCC), 2011 IEEE 13th International Conference on
  • Conference_Location
    Banff, AB
  • Print_ISBN
    978-1-4577-1564-8
  • Electronic_ISBN
    978-0-7695-4538-7
  • Type

    conf

  • DOI
    10.1109/HPCC.2011.81
  • Filename
    6063042