• DocumentCode
    652261
  • Title

    An Optimal Algorithm for Extreme Scale Job Launching

  • Author

    Goehner, Joshua D. ; Groves, Taylor L. ; Arnold, Donna ; Ahn, Dong H. ; Lee, Gregory L.

  • Author_Institution
    Rogue Wave Software, Inc., Natick, MA, USA
  • fYear
    2013
  • fDate
    16-18 July 2013
  • Firstpage
    1115
  • Lastpage
    1122
  • Abstract
    All distributed software systems execute a bootstrapping phase upon instantiation. During this phase, the composite processes of the system are deployed onto a set of computational nodes and initialization information is disseminated amongst these processes. However, with the growing trend toward high-end systems with very large numbers of compute cores, the bootstrapping phase increasingly is becoming a bottleneck. This presents significant challenges to several key elements of extreme-scale machines: the usefulness of interactive run-time tools and the efficiency of newly emerging computational models such as many-task computing and uncertainty quantification runs are increasingly subject to the inefficient bootstrapping problem. In this paper, we propose a novel algorithm that determines an optimal bootstrapping strategy. Our algorithm is based on a process launch performance model and finds the optimal strategy given a specified set of nodes. We prove that our process launching strategy is optimal with empirical comparisons with other standard strategies. Lastly, we show that our algorithm can decrease bootstrapping time in a real software system by up to 50%.
  • Keywords
    computer bootstrapping; distributed processing; information dissemination; interactive systems; bootstrapping phase; computational nodes; distributed software systems; extreme scale job launching; high-end systems; initialization information dissemination; interactive run-time tools; optimal bootstrapping strategy; optimal process launching strategy; process launch performance model; Computational modeling; Data models; Greedy algorithms; Mathematical model; Software; Software algorithms; Topology; bootstrapping; job launching; large scale systems software; resource and job management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
  • Conference_Location
    Melbourne, VIC
  • Type

    conf

  • DOI
    10.1109/TrustCom.2013.135
  • Filename
    6680956