• DocumentCode
    451267
  • Title

    A Self-Organizing Flock of Condors

  • Author

    Butt, Ali Raza ; Zhang, Rongmei ; Hu, Y. Charlie

  • Author_Institution
    Purdue University, West Lafayette, IN
  • fYear
    2003
  • fDate
    15-21 Nov. 2003
  • Firstpage
    42
  • Lastpage
    42
  • Abstract
    Condor provides high throughput computing by leveraging idle-cycles on off-the-shelf desktop machines. It also supports flocking, a mechanism for sharing resources among Condor pools. Since Condor pools distributed over a wide area can have dynamically changing availability and sharing preferences, the current flocking mechanism based on static configurations can limit the potential of sharing resources across Condor pools. This paper presents a technique for resource discovery in distributed Condor pools using peer-to-peer mechanisms that are self-organizing, fault-tolerant, scalable, and locality-aware. Locality-awareness guarantees that applications are not shipped across long distances when nearby resources are available. Measurements using a synthetic job trace show that self-organized flocking reduces the maximum job wait time in queue for a heavily loaded pool by a factor of 10 compared to without flocking. Simulations of 1000 Condor pools are also presented and the results confirm that our technique discovers and utilizes nearby resources in the physical network.
  • Keywords
    Availability; Fault tolerance; Peer to peer computing; Permission; Processor scheduling; Resource management; Robustness; Routing; Throughput; Time measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Supercomputing, 2003 ACM/IEEE Conference
  • Print_ISBN
    1-58113-695-1
  • Type

    conf

  • DOI
    10.1109/SC.2003.10031
  • Filename
    1592945