• DocumentCode
    1484640
  • Title

    QoS-Aware Fault-Tolerant Scheduling for Real-Time Tasks on Heterogeneous Clusters

  • Author

    Zhu, Xiaomin ; Qin, Xiao ; Qiu, Meikang

  • Author_Institution
    Sci. & Technol. on Inf. Syst. Eng. Lab., Nat. Univ. of Defense Technol., Changsha, China
  • Volume
    60
  • Issue
    6
  • fYear
    2011
  • fDate
    6/1/2011 12:00:00 AM
  • Firstpage
    800
  • Lastpage
    812
  • Abstract
    Fault-tolerant scheduling plays a significant role in improving system reliability of clusters. Although extensive fault-tolerant scheduling algorithms have been proposed for real-time tasks in parallel and distributed systems, quality of service (QoS) requirements of tasks have not been taken into account. This paper presents a fault-tolerant scheduling algorithm called QAFT that can tolerate one node´s permanent failures at one time instant for real-time tasks with QoS needs on heterogeneous clusters. In order to improve system flexibility, reliability, schedulability, and resource utilization, QAFT strives to either advance the start time of primary copies and delay the start time of backup copies in order to help backup copies adopt the passive execution scheme, or to decrease the simultaneous execution time of the primary and backup copies of a task as much as possible to improve resource utilization. QAFT is capable of adaptively adjusting the QoS levels of tasks and the execution schemes of backup copies to attain high system flexibility. Furthermore, we employ the overlapping technology of backup copies. The latest start time of backup copies and their constraints are analyzed and discussed. We conduct extensive experiments to compare our QAFT with two existing schemes-NOQAFT and DYFARS. Experimental results show that QAFT significantly improves the scheduling quality of NOQAFT and DYFARS.
  • Keywords
    fault tolerant computing; quality of service; scheduling; QoS-aware fault-tolerant scheduling; distributed systems; heterogeneous clusters; parallel systems; passive execution scheme; quality of service; realtime task scheduling; Fault tolerance; Fault tolerant systems; Heuristic algorithms; Quality of service; Real time systems; Scheduling algorithm; Heterogeneous clusters; fault tolerance; heuristic.; quality of service (QoS); real-time; scheduling;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2011.68
  • Filename
    5740856