DocumentCode :
3077679
Title :
Evaluating the Effectiveness of Replication for Tail-Tolerance
Author :
Zhan Qiu ; Perez, Juan F.
Author_Institution :
Dept. of Comput., Imperial Coll. London, London, UK
fYear :
2015
fDate :
4-7 May 2015
Firstpage :
443
Lastpage :
452
Abstract :
Computing clusters (CC) are a cost-effective high-performance platform for computation-intensive scientific and engineering applications. A key challenge in managing CCs is to consistently achieve low response times. In particular, tail-tolerant methods aim to keep the tail of the response-time distribution short. In this paper we explore concurrent replication with cancelling, a tail-tolerant approach that involves processing requests and their replicas concurrently, retrieving the result from the first replica that completes, and cancelling all other replicas. We propose a stochastic model that considers any number of replicas, general processing and inter-arrival times, and computes the response time distribution. We show that replication can be very effective in keeping the response-time tail short, but these benefits highly depend on the processing-time distribution, as well as on the CC utilization and the statistical characteristics of the arrival process. We also exploit the model to support the selection of the optimal number of replicas, and a resource provisioning strategy that meets service-level objectives on the response-time percentiles.
Keywords :
concurrency control; contracts; feature selection; resource allocation; stochastic processes; CC; computing cluster; concurrent replication; model selection; resource provisioning strategy; response-time distribution; service-level objective; stochastic model; tail-tolerant method; Computational modeling; Delays; Noise measurement; Queueing analysis; Servers; Time factors; Transient analysis; Computing Clusters; Latency-tolerance; Matrix Analytic Methods; Performance Evaluation; Tail-tolerance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
Conference_Location :
Shenzhen
Type :
conf
DOI :
10.1109/CCGrid.2015.22
Filename :
7152510
Link To Document :
بازگشت