• DocumentCode
    167615
  • Title

    A Task Scheduling Algorithm Based on Replication for Maximizing Reliability on Heterogeneous Computing Systems

  • Author

    Shuli Wang ; Kenli Li ; Jing Mei ; Keqin Li ; Yan Wang

  • Author_Institution
    Coll. of Inf. Sci. & Eng., Hunan Univ., Changsha, China
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    1562
  • Lastpage
    1571
  • Abstract
    Over the past several years, a heterogeneous computing (HC) system has become more competative as a commercial computing platform than a homogeneous system. With the growing scale of HC systems, network failures become inevitable. To achieve high performance, communication reliability should be considered while designing reliability-aware task scheduling algorithms. In this paper, we propose a new algorithm called RMSR (Replication-based scheduling for Maximizing System Reliability), which incorporates task communication into system reliability. To maximize communication reliability, an improved algorithm which searches all optimal reliability communication paths for current tasks is proposed. During the task replication phase, the task reliability threshold is determined by users and each task has dynamic replicas. Our comparative studies based on randomly generated graphs show that our RMSR algorithm outperforms existing scheduling algorithms in terms of system reliability. Several factors affecting the performance are analyzed in the paper.
  • Keywords
    graph theory; optimisation; redundancy; telecommunication network reliability; RMSR algorithm; communication reliability; heterogeneous computing system; network failure; optimal reliability communication path; randomly generated graph; reliability-aware task scheduling algorithm; replication-based scheduling for maximizing system reliability; task reliability threshold; task replication phase; Algorithm design and analysis; Computational modeling; Equations; Mathematical model; Program processors; Reliability; Scheduling algorithms; Directed acyclic graph; Heterogeneous computing systems; Reliability-aware scheduling; Replication-based algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
  • Conference_Location
    Phoenix, AZ
  • Print_ISBN
    978-1-4799-4117-9
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2014.175
  • Filename
    6969562