DocumentCode :
659465
Title :
DL-MPI: Enabling data locality computation for MPI-based data-intensive applications
Author :
Jiangling Yin ; Foran, Andrew ; Jun Wang
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
506
Lastpage :
511
Abstract :
Currently, most scientific applications based on MPI adopt a compute-centric architecture. Needed data is accessed by MPI processes running on different nodes through a shared file system. Unfortunately, the explosive growth of scientific data undermines the high performance of MPI-based applications, especially in the execution environment of commodity clusters. In this paper, we present a novel approach to enable data locality computation for MPI-based data-intensive applications and refer to it as DL-MPI. DL-MPI allows MPI-based programs to obtain data distribution information for compute nodes through a novel data locality API. In addition, the problem of allocating data processing tasks to parallel processes is formulated as an integer optimization problem with the objectives of achieving data locality computation and optimal parallel execution time. For heterogeneous runtime environments, we propose a scheduling algorithm based on probability to dynamically schedule tasks to processes by evaluating the unprocessed local data and the computing ability of each compute node. We demonstrate the functionality of our methods through the implementation of scientific data processing programs as well as the incorporation of DL-MPI with existing HPC applications.
Keywords :
application program interfaces; data handling; integer programming; message passing; parallel processing; probability; scheduling; DL-MPI; HPC applications; MPI-based data-intensive applications; commodity clusters; compute-centric architecture; data locality; data processing tasks; heterogeneous runtime environments; high performance computing; integer optimization problem; message passing interface; parallel execution time; parallel process; probability; scheduling algorithm; scientific data processing programs; shared file system; task scheduling; Bandwidth; Benchmark testing; Computer architecture; Data processing; Distributed databases; Heuristic algorithms; HPC application; Hadoop file system; MPI;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691614
Filename :
6691614
Link To Document :
بازگشت