DocumentCode :
2801005
Title :
Exploiting memory hierarchies in scientific computing
Author :
Bader, Michael ; Weidendorfer, Josef
Author_Institution :
Dept. of Inf., Tech. Univ. Munchen, Munich, Germany
fYear :
2009
fDate :
21-24 June 2009
Firstpage :
33
Lastpage :
35
Abstract :
The ratio between processor and main memory performance has been increasing since quite some time, and can safely be expected to do so throughout the oncoming years. In the era of single-core processors, this was mainly observable by increased latency, for example when measured in number of (possibly stalled) CPU clock cycles. Nowadays, with multicore chips, multiple cores share the same connection to off-chip main memory, which effectively reduces available bandwidth, as well. Caches help in both cases: they provide both a much lower latency and a much higher bandwidth by being located on-chip. By holding copies of least recently used memory blocks, caches exploit the fact that programs on the average access memory in ways that often access the same memory cell (temporal locality), or nearby memory cells (spatial locality). However, this natural locality is not enough for scientific computing in HPC. Further improving any existing access locality of given algorithms is very much wanted. In this talk, we present strategies to improve the locality of memory accesses for linear algebra problems occurring in different kinds of applications: (1) an algorithmic approach based on Peano spacefilling curves that leads to inherently cache efficient (cache oblivious) matrix algorithms, such as matrix multiplication or LU decomposition for dense and sparse matrices - on single-core CPUs, as well as in the context of shared-memory multicore platforms. (2) cache optimization strategies for matrix-vector multiplications with very large, sparse matrices, as they occur in the iterative MLEM algorithm, which is used for image reconstruction in nuclear medicine. Here, different cache-aware optimization strategies are combined in order to better exploit large caches, small caches, and single cache lines.
Keywords :
cache storage; image reconstruction; matrix decomposition; matrix multiplication; microprocessor chips; natural sciences computing; sparse matrices; CPU clock cycle; LU decomposition; Peano spacefilling curve; cache efficient matrix algorithm; cache oblivious matrix algorithm; cache optimization; cache storage; cache-aware optimization; image reconstruction; iterative MLEM algorithm; linear algebra problem; matrix-vector multiplication; memory access; memory blocks; memory hierarchy; memory performance; multicore chip; nuclear medicine; off-chip main memory; processor performance; scientific computing; shared-memory multicore platform; sparse matrices; spatial locality; temporal locality; Bandwidth; Clocks; Delay; Iterative algorithms; Linear algebra; Matrix decomposition; Multicore processing; Scientific computing; Semiconductor device measurement; Sparse matrices; Cache optimization; cache oblivious algorithms; cache simulation; image reconstruction; sparse matrix vector multiplication;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing & Simulation, 2009. HPCS '09. International Conference on
Conference_Location :
Leipzig
Print_ISBN :
978-1-4244-4906-4
Electronic_ISBN :
978-1-4244-4907-1
Type :
conf
DOI :
10.1109/HPCSIM.2009.5192891
Filename :
5192891
Link To Document :
بازگشت