مرکز منطقه ای اطلاع رساني علوم و فناوري - Exploiting memory hierarchies in scientific computing

DocumentCode :

2801005

Title :

Exploiting memory hierarchies in scientific computing

Author :

Bader, Michael ; Weidendorfer, Josef

Author_Institution :

Dept. of Inf., Tech. Univ. Munchen, Munich, Germany

fYear :

2009

fDate :

21-24 June 2009

Firstpage :

Lastpage :

Abstract :

The ratio between processor and main memory performance has been increasing since quite some time, and can safely be expected to do so throughout the oncoming years. In the era of single-core processors, this was mainly observable by increased latency, for example when measured in number of (possibly stalled) CPU clock cycles. Nowadays, with multicore chips, multiple cores share the same connection to off-chip main memory, which effectively reduces available bandwidth, as well. Caches help in both cases: they provide both a much lower latency and a much higher bandwidth by being located on-chip. By holding copies of least recently used memory blocks, caches exploit the fact that programs on the average access memory in ways that often access the same memory cell (temporal locality), or nearby memory cells (spatial locality). However, this natural locality is not enough for scientific computing in HPC. Further improving any existing access locality of given algorithms is very much wanted. In this talk, we present strategies to improve the locality of memory accesses for linear algebra problems occurring in different kinds of applications: (1) an algorithmic approach based on Peano spacefilling curves that leads to inherently cache efficient (cache oblivious) matrix algorithms, such as matrix multiplication or LU decomposition for dense and sparse matrices - on single-core CPUs, as well as in the context of shared-memory multicore platforms. (2) cache optimization strategies for matrix-vector multiplications with very large, sparse matrices, as they occur in the iterative MLEM algorithm, which is used for image reconstruction in nuclear medicine. Here, different cache-aware optimization strategies are combined in order to better exploit large caches, small caches, and single cache lines.

Keywords :

cache storage; image reconstruction; matrix decomposition; matrix multiplication; microprocessor chips; natural sciences computing; sparse matrices; CPU clock cycle; LU decomposition; Peano spacefilling curve; cache efficient matrix algorithm; cache oblivious matrix algorithm; cache optimization; cache storage; cache-aware optimization; image reconstruction; iterative MLEM algorithm; linear algebra problem; matrix-vector multiplication; memory access; memory blocks; memory hierarchy; memory performance; multicore chip; nuclear medicine; off-chip main memory; processor performance; scientific computing; shared-memory multicore platform; sparse matrices; spatial locality; temporal locality; Bandwidth; Clocks; Delay; Iterative algorithms; Linear algebra; Matrix decomposition; Multicore processing; Scientific computing; Semiconductor device measurement; Sparse matrices; Cache optimization; cache oblivious algorithms; cache simulation; image reconstruction; sparse matrix vector multiplication;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computing & Simulation, 2009. HPCS '09. International Conference on

Conference_Location :

Leipzig

Print_ISBN :

978-1-4244-4906-4

Electronic_ISBN :

978-1-4244-4907-1

Type :

conf

DOI :

10.1109/HPCSIM.2009.5192891

Filename :

5192891

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2801005