مرکز منطقه ای اطلاع رساني علوم و فناوري - High-Performance Energy-Efficient Recursive Dynamic Programming with Matrix-Multiplication-Like Flexible Kernels

DocumentCode :

3199425

Title :

High-Performance Energy-Efficient Recursive Dynamic Programming with Matrix-Multiplication-Like Flexible Kernels

Author :

Tithi, Jesmin Jahan ; Ganapathi, Pramod ; Talati, Aakrati ; Aggarwal, Sonal ; Chowdhury, Rezaul

Author_Institution :

Dept. of Comput. Sci., Stony Brook Univ., Stony Brook, NY, USA

fYear :

2015

fDate :

25-29 May 2015

Firstpage :

303

Lastpage :

312

Abstract :

Dynamic Programming (DP) problems arise in wide range of application areas spanning from logistics to computational biology. In this paper, we show how to obtain high-performing parallel implementations for a class of Problems by reducing them to highly utilizable flexible kernels through cache-oblivious recursive divide- and-conquer(CORDAC). We implement parallel CORDAC algorithms for four non-trivial DP problems, namely the parenthesization problem, Floyd-Warshall´s all-pairs shortest path (FW-APSP), sequence alignment with general gap penalty (gap problem)and protein accordion folding. To the best of our knowledge our algorithms for protein accordion folding and the gap problem are novel. All four algorithms have asymptotically optimal cache performance, and all but FW-APSP have asymptotically more parallelism than their looping counterparts. We show that the base cases of our CORDAC algorithms are predominantly matrix-multiplication-like (MM-like) flexible kernels that expose many optimization opportunities not offered by traditional looping DP codes. As a result, one can obtain highly efficient DP implementations by optimizing those flexible kernels only. Our implementations achieve 5 -- 150× speedup over their standard loop based DP counterparts while consuming order-of-magnitude less energy on modern multicore machines with 16 -- 32 cores. We also compareour implementations with parallel tiled codes generated by existing polyhedral compilers: Polly, PoCC and PLuTo, and show that our implementations run significantly faster. Finally, we present results on manicures (Intel Xeon Phi) and clusters of multicores obtained using simple extensions for SIMD and shared-distributed-shared-memory architectures, respectively, demonstrating the versatility of our approach. Our optimization approach is highly systematic and suitable for automation.

Keywords :

divide and conquer methods; dynamic programming; mathematics computing; matrix multiplication; parallel algorithms; DP problem; FW-APSP; Floyd-Warshall all-pairs shortest path; cache-oblivious recursive divide-and-conquer; dynamic programming; gap penalty; high-performing parallel implementation; matrix-multiplication-like flexible kernel; optimization; parallel CORDAC algorithm; cache-oblivious; divide-and-conquer; dynamic programming; flexible kernel; polyhedral compiler; recursive;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International

Conference_Location :

Hyderabad

ISSN :

1530-2075

Type :

conf

DOI :

10.1109/IPDPS.2015.107

Filename :

7161519

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3199425