DocumentCode :
244520
Title :
NEWT - A resilient BSP framework for Iterative algorithms on hadoop YARN
Author :
Kromonov, Ilja ; Jakovits, P. ; Srirama, Satish Narayana
Author_Institution :
Inst. of Comput. Sci., Univ. of Tartu, Tartu, Estonia
fYear :
2014
fDate :
21-25 July 2014
Firstpage :
251
Lastpage :
259
Abstract :
The importance of fault tolerance for parallel computing is ever increasing. The mean time between failures (MTBF) is predicted to decrease significantly for future highly parallel systems. At the same time, the current trend to use commodity hardware to reduce the cost of clusters puts pressure on users to ensure fault tolerance of their applications. Cloud-based resources are one of the environments where the latter holds true. When it comes to embarrassingly parallel data-intensive algorithms, MapReduce has gone a long way in ensuring users can easily utilize these resources without the fear of losing work. However, this does not apply to iterative communication-intensive algorithms common in the scientific computing domain. In this work we propose a new programming model inspired by Bulk Synchronous Parallel (BSP), for creating a new fault tolerant distributed computing framework. We strive to retain the advantages that MapReduce provides, yet efficiently support a larger assortment of algorithms, such as the aforementioned iterative ones. The model adopts an approach similar to continuation passing for implementing parallel algorithms and facilitates fault tolerance inherent in the BSP program structure. Based on the model we created a distributed computing framework - NEWT, which we describe and use to validate the approach.
Keywords :
fault tolerant computing; iterative methods; parallel algorithms; parallel programming; Hadoop YARN; MTBF; MapReduce; NEWT framework; bulk synchronous parallel programming; cloud-based resources; fault tolerance; fault tolerant distributed computing framework; iterative algorithms; iterative communication-intensive algorithms; mean time between failure; parallel algorithms; parallel computing; parallel data-intensive algorithms; resilient BSP framework; Adaptation models; Computational modeling; Fault tolerance; Fault tolerant systems; Iterative methods; Programming; Synchronization; Bulk Synchronous Parallel; Hadoop YARN; cloud computing; fault tolerance; iterative algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing & Simulation (HPCS), 2014 International Conference on
Conference_Location :
Bologna
Print_ISBN :
978-1-4799-5312-7
Type :
conf
DOI :
10.1109/HPCSim.2014.6903693
Filename :
6903693
Link To Document :
بازگشت