DocumentCode :
3199425
Title :
High-Performance Energy-Efficient Recursive Dynamic Programming with Matrix-Multiplication-Like Flexible Kernels
Author :
Tithi, Jesmin Jahan ; Ganapathi, Pramod ; Talati, Aakrati ; Aggarwal, Sonal ; Chowdhury, Rezaul
Author_Institution :
Dept. of Comput. Sci., Stony Brook Univ., Stony Brook, NY, USA
fYear :
2015
fDate :
25-29 May 2015
Firstpage :
303
Lastpage :
312
Abstract :
Dynamic Programming (DP) problems arise in wide range of application areas spanning from logistics to computational biology. In this paper, we show how to obtain high-performing parallel implementations for a class of Problems by reducing them to highly utilizable flexible kernels through cache-oblivious recursive divide- and-conquer(CORDAC). We implement parallel CORDAC algorithms for four non-trivial DP problems, namely the parenthesization problem, Floyd-Warshall´s all-pairs shortest path (FW-APSP), sequence alignment with general gap penalty (gap problem)and protein accordion folding. To the best of our knowledge our algorithms for protein accordion folding and the gap problem are novel. All four algorithms have asymptotically optimal cache performance, and all but FW-APSP have asymptotically more parallelism than their looping counterparts. We show that the base cases of our CORDAC algorithms are predominantly matrix-multiplication-like (MM-like) flexible kernels that expose many optimization opportunities not offered by traditional looping DP codes. As a result, one can obtain highly efficient DP implementations by optimizing those flexible kernels only. Our implementations achieve 5 -- 150× speedup over their standard loop based DP counterparts while consuming order-of-magnitude less energy on modern multicore machines with 16 -- 32 cores. We also compareour implementations with parallel tiled codes generated by existing polyhedral compilers: Polly, PoCC and PLuTo, and show that our implementations run significantly faster. Finally, we present results on manicures (Intel Xeon Phi) and clusters of multicores obtained using simple extensions for SIMD and shared-distributed-shared-memory architectures, respectively, demonstrating the versatility of our approach. Our optimization approach is highly systematic and suitable for automation.
Keywords :
divide and conquer methods; dynamic programming; mathematics computing; matrix multiplication; parallel algorithms; DP problem; FW-APSP; Floyd-Warshall all-pairs shortest path; cache-oblivious recursive divide-and-conquer; dynamic programming; gap penalty; high-performing parallel implementation; matrix-multiplication-like flexible kernel; optimization; parallel CORDAC algorithm; cache-oblivious; divide-and-conquer; dynamic programming; flexible kernel; polyhedral compiler; recursive;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location :
Hyderabad
ISSN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2015.107
Filename :
7161519
Link To Document :
بازگشت