مرکز منطقه ای اطلاع رساني علوم و فناوري - Schedule first, manage later: Network-aware load balancing

DocumentCode :

623620

Title :

Schedule first, manage later: Network-aware load balancing

Author :

Nahir, Amir ; Orda, Ariel ; Raz, Danny

Author_Institution :

Dept. of Comput. Sci. Technion, Technion - Israel Inst. of Technol., Haifa, Israel

fYear :

2013

fDate :

14-19 April 2013

Firstpage :

510

Lastpage :

514

Abstract :

Load balancing in large distributed server systems is a complex optimization problem of critical importance in cloud systems and data centers. Existing schedulers often incur a high overhead in communication when collecting the data required to make the scheduling decision, hence delaying the job request on its way to the executing server. We propose a novel scheme that incurs no communication overhead between the users and the servers upon job arrival, thus removing any scheduling overhead from the job´s critical path. Our approach is based on creating several replicas of each job and sending each replica to a different server. Upon the arrival of a replica to the head of the queue at its server, the latter signals the servers holding replicas of that job, so as to remove them from their queues. We show, through analysis and simulations, that this scheme improves the expected queuing overhead over traditional schemes by a factor of 9 (or more) under various load conditions. In addition, we show that our scheme remains efficient even when the inter-server signal propagation delay is significant (relative to the job´s execution time). We provide heuristic solutions to the performance degradation that occurs in such cases and show, by simulations, that they efficiently mitigate the detrimental effect of propagation delays. Finally, we demonstrate the efficiency of our proposed scheme in a real-world environment by implementing a load balancing system based on it, deploying the system on the Amazon Elastic Compute Cloud (EC2), and measuring its performance.

Keywords :

cloud computing; computer centres; optimisation; queueing theory; resource allocation; Amazon Elastic Compute Cloud; EC2; cloud systems; communication overhead; complex optimization problem; data centers; distributed server systems; heuristic solutions; inter-server signal propagation delay; job critical path; network-aware load balancing; propagation delays; queuing overhead; real-world environment; scheduling decision; server queue; Analytical models; Delays; Load management; Load modeling; Propagation delay; Queueing analysis; Servers;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

INFOCOM, 2013 Proceedings IEEE

Conference_Location :

Turin

ISSN :

0743-166X

Print_ISBN :

978-1-4673-5944-3

Type :

conf

DOI :

10.1109/INFCOM.2013.6566825

Filename :

6566825

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=623620