• DocumentCode
    3475348
  • Title

    Improving network performance through task duplication for parallel applications on clusters

  • Author

    Qin, Xiao

  • Author_Institution
    Dept. of Comput. Sci., New Mexico Inst. of Min. & Technol., Socorro, NM, USA
  • fYear
    2005
  • fDate
    7-9 April 2005
  • Firstpage
    35
  • Lastpage
    42
  • Abstract
    While data replication is widely used in clusters to provide fault tolerance, it can heavily stress communication networks and degrade overall performance of parallel applications. The performance degradation is particularly unacceptable with disk-write-intensive applications. As a result, data duplication management for parallel applications running on clusters is a significant and urgent challenge. This paper presents the design, implementation, and evaluation of a network-aware task duplication management system, or TUFF, where redundant data can be regenerated by corresponding duplicate tasks rather than directly replicating through networks. In addition, TUFF is capable of improving availability performance of parallel applications, because TUFF allows two replicas of each I/O-intensive task to be executed on two different nodes. We have implemented and evaluated TUFF using extensive simulations under a diverse set of workload conditions. Experimental results show that TUFF improves the overall performance of parallel applications running on clusters by efficiently reducing network resource consumption.
  • Keywords
    computer network management; computer network reliability; fault tolerant computing; parallel processing; performance evaluation; workstation clusters; I-O-intensive task; TUFF; availability; data duplication management; disk-write-intensive application; fault tolerance; network clusters; parallel applications; performance degradation; stress communication networks; Application software; Communication networks; Computer science; Costs; Degradation; Fault tolerance; Fault tolerant systems; File systems; Middleware; Redundancy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Performance, Computing, and Communications Conference, 2005. IPCCC 2005. 24th IEEE International
  • ISSN
    1097-2641
  • Print_ISBN
    0-7803-8991-3
  • Type

    conf

  • DOI
    10.1109/PCCC.2005.1460511
  • Filename
    1460511