Author :
Yamazaki, Ichitaro ; Mary, Theo ; Kurzak, Jakub ; Tomov, Stanimire ; Dongarra, Jack
Author_Institution :
Dept. of Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA
Abstract :
Low-rank matrix approximations play important roles in many statistical, scientific, and engineering applications. To compute such approximations, different algorithms have been developed by researchers from a wide range of areas including theoretical computer science, numerical linear algebra, statistics, applied mathematics, data analysis, machine learning, and physical and biological sciences. In this paper, to combine these efforts, we present an “access-averse” framework which encapsulates some of the existing algorithms for computing a truncated singular value decomposition (SVD). This framework not only allows us to develop software whose performance can be tuned based on domain specific knowledge, but it also allows a user from one discipline to test an algorithm from another, or to combine the techniques from different algorithms. To demonstrate this potential, we implement the framework on multicore CPUs with multiple GPUs and compare the performance of two representative algorithms, blocked variants of matrix power and Lanczos methods. Our performance studies with large-scale graphs from real applications demonstrate that, when combined with communication-avoiding and thick-restarting techniques, the Lanczos method can be competitive with the power method, which is one of the most popular methods currently used for these applications. InIn addition, though we only focus on the truncated SVDs, the two computational kernels used in our studies, the sparse-matrix dense-matrix multiply and tall-skinny QR factorization, are fundamental building blocks for computing low-rank approximations with other objectives. Hence, our studies may have a greater impact beyond the truncated SVDs.
Keywords :
approximation theory; graphics processing units; mathematics computing; matrix multiplication; multiprocessing systems; singular value decomposition; sparse matrices; GPU; Lanczos method; SVD; access-averse framework; communication-avoiding techniques; computational kernels; domain specific knowledge; low-rank matrix approximation computation; matrix power blocked variants; multicore CPU; power method; sparse-matrix dense-matrix multiplication; tall-skinny QR factorization; thick-restarting techniques; truncated singular value decomposition; Approximation algorithms; Approximation methods; Convergence; Kernel; Software algorithms; Sparse matrices; Vectors;