DocumentCode :
2688309
Title :
Improving the Accuracy of High Performance BLAS Implementations Using Adaptive Blocked Algorithms
Author :
Badin, Matthew ; D´Alberto, Paolo ; Bic, Lubomir ; Dillencourt, Michael ; Nicolau, Alexandru
Author_Institution :
Univ. of California Irvine, Irvine, CA, USA
fYear :
2011
fDate :
26-29 Oct. 2011
Firstpage :
120
Lastpage :
127
Abstract :
Matrix multiply is ubiquitous in scientific computing. Considerable effort has been spent on improving its performance. Once methods that make efficient use of the processor have been exhausted, methods that use less operations than the canonical matrix multiply must be explored. Combining the two methods yields a hybrid matrix multiply algorithm. Hybrid matrix multiply algorithms tend to be less accurate than the canonical matrix multiply implementation, leaving room for improvement. There are well-known techniques for improving accuracy, but they tend to be slow and it is not immediately obvious how best to apply them to hybrid algorithms without lowering performance. Previous attempts have focused on the bottom of the hybrid matrix multiply algorithm, modifying the high-performance matrix multiply implementation. In contrast, the top-down approach presented here does not require the modification of the high-performance matrix multiply implementation at the bottom, nor does it require modification of the fast asymptotic matrix multiply algorithm at the top. The three-level hybrid algorithm presented here not only has up to 10% better performance than the fastest high-performance matrix multiply, but is also more accurate.
Keywords :
mathematics computing; matrix multiplication; recursive functions; adaptive blocked algorithm; canonical matrix multiply; high performance BLAS implementation; high-performance matrix multiply implementation; hybrid matrix multiply algorithm; recursive matrix multiply; scientific computing; Accuracy; Computer architecture; Context; Kernel; Strips; Tiles; USA Councils; Hybrid Matrix Multiply; Pairwise Summation; Recursive Matrix Multiply;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Architecture and High Performance Computing (SBAC-PAD), 2011 23rd International Symposium on
Conference_Location :
Vitoria, Espirito Santo
ISSN :
1550-6533
Print_ISBN :
978-1-4577-2050-5
Type :
conf
DOI :
10.1109/SBAC-PAD.2011.21
Filename :
6106013
Link To Document :
بازگشت