DocumentCode
244344
Title
Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters
Author
Runhui Li ; Lee, Patrick P. C. ; Yuchong Hu
Author_Institution
Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Hong Kong, China
fYear
2014
fDate
23-26 June 2014
Firstpage
419
Lastpage
430
Abstract
We have witnessed an increasing adoption of erasure coding in modern clustered storage systems to reduce the storage overhead of traditional 3-way replication. However, it remains an open issue of how to customize the data analytics paradigm for erasure-coded storage, especially when the storage system operates in failure mode. We propose degraded-first scheduling, a new MapReduce scheduling scheme that improves MapReduce performance in erasure-coded clustered storage systems in failure mode. Its main idea is to launch degraded tasks earlier so as to leverage the unused network resources. We conduct mathematical analysis and discrete event simulation to show the performance gain of degraded-first scheduling over Hadoop´s default locality-first scheduling. We further implement degraded-first scheduling on Hadoop and conduct test bed experiments in a 13-node cluster. We show that degraded-first scheduling reduces the MapReduce runtime of locality-first scheduling.
Keywords
data analysis; discrete event simulation; pattern clustering; scheduling; storage management; system recovery; 3-way replication; Hadoop default locality-first scheduling; MapReduce scheduling scheme; data analytics paradigm; degraded-first scheduling; discrete event simulation; erasure coding; erasure-coded clustered storage systems; failure mode; mathematical analysis; network resources; Algorithm design and analysis; Availability; Encoding; Mathematical analysis; Runtime; Scheduling; Switches;
fLanguage
English
Publisher
ieee
Conference_Titel
Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on
Conference_Location
Atlanta, GA
Type
conf
DOI
10.1109/DSN.2014.47
Filename
6903599
Link To Document