DocumentCode
451267
Title
A Self-Organizing Flock of Condors
Author
Butt, Ali Raza ; Zhang, Rongmei ; Hu, Y. Charlie
Author_Institution
Purdue University, West Lafayette, IN
fYear
2003
fDate
15-21 Nov. 2003
Firstpage
42
Lastpage
42
Abstract
Condor provides high throughput computing by leveraging idle-cycles on off-the-shelf desktop machines. It also supports flocking, a mechanism for sharing resources among Condor pools. Since Condor pools distributed over a wide area can have dynamically changing availability and sharing preferences, the current flocking mechanism based on static configurations can limit the potential of sharing resources across Condor pools. This paper presents a technique for resource discovery in distributed Condor pools using peer-to-peer mechanisms that are self-organizing, fault-tolerant, scalable, and locality-aware. Locality-awareness guarantees that applications are not shipped across long distances when nearby resources are available. Measurements using a synthetic job trace show that self-organized flocking reduces the maximum job wait time in queue for a heavily loaded pool by a factor of 10 compared to without flocking. Simulations of 1000 Condor pools are also presented and the results confirm that our technique discovers and utilizes nearby resources in the physical network.
Keywords
Availability; Fault tolerance; Peer to peer computing; Permission; Processor scheduling; Resource management; Robustness; Routing; Throughput; Time measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Supercomputing, 2003 ACM/IEEE Conference
Print_ISBN
1-58113-695-1
Type
conf
DOI
10.1109/SC.2003.10031
Filename
1592945
Link To Document