• DocumentCode
    1135189
  • Title

    Dynamic Load Balancing and Job Replication in a Global-Scale Grid Environment: A Comparison

  • Author

    Dobber, Menno ; van der Mei, Rob ; Koole, Ger

  • Author_Institution
    Dept. of Math., Vrije Univ. Amsterdam, Amsterdam
  • Volume
    20
  • Issue
    2
  • fYear
    2009
  • Firstpage
    207
  • Lastpage
    218
  • Abstract
    Global-scale grids provide a massive source of processing power, providing the means to support processor intensive parallel applications. The strong burstiness and unpredictability of the available processing and network resources raise the strong need to make applications robust against the dynamics of grid environments. The two main techniques that are most suitable to cope with the dynamic nature of the grid are dynamic load balancing (DLB) and job replication (JR). In this paper, we analyze and compare the effectiveness of these two approaches by means of trace-driven simulations. We observe that there exists an easy-to-measure statistic Y and a corresponding threshold value Y*, such that DLB consistently outperforms JR when Y > Y*, whereas the reverse is true for Y < Y*. Based on this observation, we propose a simple and easy-to-implement approach, throughout referred to as the DLB/JR method, that can make dynamic decisions about whether to use DLB or JR. Extensive simulations based on a large set of real data monitored in a global-scale grid show that our DLB/JR method consistently performs at least as good as both DLB and JR in all circumstances, which makes our DLB/JR method highly robust against the unpredictable nature of global-scale grids.
  • Keywords
    grid computing; resource allocation; dynamic load balancing; global-scale grid environment; job replication; network resources; parallel applications; trace-driven simulations; Communication/Networking and Information Technology; Grid computing; Parallel Architectures; Parallelism and concurrency; Performance; Performance Analysis and Design Aids; Performance and Reliability; Performance of Systems; dynamic load balancing; job replication; performance.;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2008.61
  • Filename
    4492771