Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems

Author

Fengguang Song ; YarKhan, Asim ; Dongarra, Jack

Author_Institution

EECS Dept., Univ. of Tennessee, Knoxville, TN, USA

fYear

2009

fDate

14-20 Nov. 2009

Firstpage

1

Lastpage

11

Abstract

This paper presents a dynamic task scheduling approach to executing dense linear algebra algorithms on multicore systems (either shared-memory or distributed-memory). We use a task-based library to replace the existing linear algebra subroutines such as PBLAS to transparently provide the same interface and computational function as the ScaLAPACK library. Linear algebra programs are written with the task-based library and executed by a dynamic runtime system. We mainly focus our runtime system design on the metric of performance scalability. We propose a distributed algorithm to solve data dependences without process cooperation. We have implemented the runtime system and applied it to three linear algebra algorithms: Cholesky, LU, and QR factorizations. Our experiments on both shared-memory machines (16, 32 cores) and distributed-memory machines (1024 cores) demonstrate that our runtime system is able to achieve good scalability. Furthermore, we provide analytical analysis to show why the tiled algorithms are scalable and the expected execution time.

Keywords

distributed algorithms; distributed memory systems; linear algebra; mathematics computing; matrix decomposition; processor scheduling; shared memory systems; software libraries; Cholesky factorization; LU factorization; PBLAS; QR factorization; ScaLAPACK library; analytical analysis; computational function; data dependency; dense linear algebra algorithms; distributed algorithm; distributed-memory machines; distributed-memory multicore systems; distributed-memory system; dynamic runtime system; dynamic task scheduling; linear algebra programs; linear algebra subroutines; performance scalability; process cooperation; runtime system design; shared-memory machines; shared-memory system; task-based library; tiled algorithms;

fLanguage

English

Publisher

ieee

Conference_Titel

High Performance Computing Networking, Storage and Analysis, Proceedings of the Conference on

Conference_Location

Portland, OR

Type

conf

DOI

10.1145/1654059.1654079

Filename

6375569