مرکز منطقه ای اطلاع رساني علوم و فناوري - Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA

DocumentCode :

3145659

Title :

Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA

Author :

Bosilca, George ; Bouteiller, Aurelien ; Danalis, Anthony ; Faverge, Mathieu ; Haidar, Azzam ; Herault, Thomas ; Kurzak, Jakub ; Langou, Julien ; Lemarinier, Pierre ; Ltaief, Hatem ; Luszczek, Piotr ; YarKhan, Asim ; Dongarra, Jack

Author_Institution :

Innovative Comput. Lab., Univ. of Tennessee, Knoxville, TN, USA

fYear :

2011

fDate :

16-20 May 2011

Firstpage :

1432

Lastpage :

1441

Abstract :

We present a method for developing dense linear algebra algorithms that seamlessly scales to thousands of cores. It can be done with our project called DPLASMA (Distributed PLASMA) that uses a novel generic distributed Direct Acyclic Graph Engine (DAGuE). The engine has been designed for high performance computing and thus it enables scaling of tile algorithms, originating in PLASMA, on large distributed memory systems. The underlying DAGuE framework has many appealing features when considering distributed-memory platforms with heterogeneous multicore nodes: DAG representation that is independent of the problem-size, automatic extraction of the communication from the dependencies, overlapping of communication and computation, task prioritization, and architecture-aware scheduling and management of tasks. The originality of this engine lies in its capacity to translate a sequential code with nested-loops into a concise and synthetic format which can then be interpreted and executed in a distributed environment. We present three common dense linear algebra algorithms from PLASMA (Parallel Linear Algebra for Scalable Multi-core Architectures), namely: Cholesky, LU, and QR factorizations, to investigate their data driven expression and execution in a distributed system. We demonstrate through experimental results on the Cray XT5 Kraken system that our DAG-based approach has the potential to achieve sizable fraction of peak performance which is characteristic of the state-of-the-art distributed numerical software on current and emerging architectures.

Keywords :

directed graphs; matrix decomposition; multiprocessing systems; parallel architectures; Cholesky factorization; Cray XT5 Kraken system; DAGuE framework; DPLASMA; LU factorization; QR factorization; architecture-aware scheduling; dense linear algebra algorithms; direct acyclic graph engine; distributed PLASMA; distributed memory systems; high performance computing; massively parallel architectures; multicore nodes; nested loops; parallel linear algebra; scalable multicore architectures; sequential code translation; task management; task prioritization; Engines; Heuristic algorithms; Linear algebra; Multicore processing; Plasmas; Tiles;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on

Conference_Location :

Shanghai

ISSN :

1530-2075

Print_ISBN :

978-1-61284-425-1

Electronic_ISBN :

1530-2075

Type :

conf

DOI :

10.1109/IPDPS.2011.299

Filename :

6008998

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3145659