Title :
BlobSeer: Bringing high throughput under heavy concurrency to Hadoop Map-Reduce applications
Author :
Nicolae, Bogdan ; Moise, Diana ; Antoniu, Gabriel ; Bougé, Luc ; Dorier, Matthieu
Author_Institution :
IRISA, Univ. of Rennes 1, Rennes, France
Abstract :
Hadoop is a software framework supporting the Map-Reduce programming model. It relies on the Hadoop Distributed File System (HDFS) as its primary storage system. The efficiency of HDFS is crucial for the performance of Map-Reduce applications. We substitute the original HDFS layer of Hadoop with a new, concurrency-optimized data storage layer based on the BlobSeer data management service. Thereby, the efficiency of Hadoop is significantly improved for data-intensive Map-Reduce applications, which naturally exhibit a high degree of data access concurrency. Moreover, BlobSeer´s features (built-in versioning, its support for concurrent append operations) open the possibility for Hadoop to further extend its functionalities. We report on extensive experiments conducted on the Grid´5000 testbed. The results illustrate the benefits of our approach over the original HDFS-based implementation of Hadoop.
Keywords :
concurrency control; distributed processing; storage management; BlobSeer data management service; Hadoop Map-Reduce application; Hadoop distributed file system; Map-Reduce programming model; concurrency-optimized data storage layer; data access concurrency; heavy concurrency; primary storage system; software framework; Application software; Concurrent computing; Data processing; Distributed computing; File servers; File systems; Memory; Parallel programming; Testing; Throughput; BlobSeer; Data-intensive; Distributed file systems; Hadoop; Heavy access concurrency; High Throughput; Large-scale distributed computing; Map-Reduce-based application;
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-6442-5
DOI :
10.1109/IPDPS.2010.5470433