Title :
Performance modeling and optimal block size selection for a BLAS-3 based tridiagonalization algorithm
Author :
Yamamoto, Yusaku
Author_Institution :
Dept. of Computational Sci. & Eng., Nagoya Univ.
Abstract :
We construct a performance model for Bischof & Wu´s tridiagonalization algorithm that is fully based on the level-3 BLAS. The model has a hierarchical structure, which reflects the hierarchical structure of the original algorithm, and given the matrix size, the two block sizes and the performance data of the underlying BLAS routines, predicts the execution time of the algorithm. Experiments on the Opteron and Alpha 21264A processors show that the model is quite accurate and can predict the performance of the algorithm for matrix sizes from 1920 to 7680 and for various block sizes with relative errors below 10%. The model will serve as a key component of an automatic tuned library that selects the optimal block sizes itself. It can also be used in a grid environment to help the user find which of the available machines to use to solve his/her problem in the shortest time
Keywords :
grid computing; matrix decomposition; parallel machines; performance evaluation; software libraries; Alpha 21264A processors; BLAS-3 based tridiagonalization algorithm; Bischof & Wu tridiagonalization algorithm; Opteron; automatic tuned library; block size selection; grid environment; performance modeling; Bandwidth; Eigenvalues and eigenfunctions; Libraries; Linear algebra; Microprocessors; Predictive models;
Conference_Titel :
High-Performance Computing in Asia-Pacific Region, 2005. Proceedings. Eighth International Conference on
Conference_Location :
Beijing
Print_ISBN :
0-7695-2486-9
DOI :
10.1109/HPCASIA.2005.76