Title :
BPAR: A Bundle-Based Parallel Aggregation Framework for Decoupled I/O Execution
Author :
Teng Wang ; Vasko, Kevin ; Zhuo Liu ; Hui Chen ; Weikuan Yu
Author_Institution :
Auburn Univ., Auburn, AL, USA
Abstract :
In today\´s "Big Data" era, developers have adopted I/O techniques such as MPI-IO, Parallel NetCDF and HDF5 to garner enough performance to manage the vast amount of data that scientific applications require. These I/O techniques offer parallel access to shared datasets and together with a set of optimizations such as data sieving and two-phase I/O to boost I/O throughput. While most of these techniques focus on optimizing the access pattern on a single file or file extent, few of these techniques consider cross-file I/O optimizations. This paper aims to explore the potential benefit from cross-file I/O aggregation. We propose a Bundle-based PARallel Aggregation framework (BPAR) and design three partitioning schemes under such framework that targets at improving the I/O performance of a mission-critical application GEOS-5, as well as a broad range of other scientific applications. The results of our experiments reveal that BPAR can achieve on average 2.1× performance improvement over the baseline GEOS-5.
Keywords :
Big Data; input-output programs; BPAR; Big Data; HDF5; I/O performance; I/O techniques; I/O throughput; MPI-IO; access pattern; baseline GEOS-5; bundle-based parallel aggregation framework; cross-file I/O aggregation; cross-file I/O optimizations; data sieving; decoupled I/O execution; file extent; mission-critical application GEOS-5; parallel NetCDF; partitioning schemes; scientific applications; shared datasets; single file; two-phase I/O; Equations; Optimization; Organizations; Parallel processing; Silicon; Throughput; Writing;
Conference_Titel :
Data Intensive Scalable Computing Systems (DISCS), 2014 International Workshop on
DOI :
10.1109/DISCS.2014.6