L2 Cache Performance Analysis and Optimizations for Processing HDF5 Data on Multi-core Nodes

Author

Bhowmik, Rajdeep ; Govindaraju, Madhusudhan

Author_Institution

Dept. of Comput. Sci., SUNY Binghamton, Binghamton, NY, USA

fYear

2012

fDate

10-13 July 2012

Firstpage

142

Lastpage

149

Abstract

It is important to design and develop scientific middleware libraries to harness the opportunities presented by emerging multi-core processors that are available on grid and cloud environments. Scientific middleware libraries not adhering or adapting to this programming paradigm can suffer from severe performance limitations while executing on emerging multi-core processors. In this paper, we focus on the utilization of a critical shared resource on chip multiprocessors (CMPs), the L2 cache. The way in which an application schedules and assigns processing work to each thread determines the access pattern of the shared L2 cache, which may result in either enhancing or diminishing the effects of memory latency on a multi-core processor. Therefore, while processing scientific datasets such as HDF5, it is essential to conduct fine-grained analysis of cache utilization, to make informed processing and scheduling decisions in multi-threaded programming. In this paper, using the TAU toolkit for performance feedback from dual- and quad-core machines, we analyze and recommend methods for effective scheduling of threads on multi-core nodes to augment the performance of scientific applications processing HDF5 data. We discuss the benefits that can be achieved by using L2 Cache-Affinity and L2 Balanced-Set based scheduling algorithms for improving L2 cache performance and effectively the overall execution time.

Keywords

cache storage; microprocessor chips; middleware; multi-threading; multiprocessing systems; natural sciences computing; processor scheduling; programming; HDF5 data processing; L2 balanced-set based scheduling algorithm; L2 cache performance analysis; L2 cache-affinity based scheduling algorithm; TAU toolkit; application scheduling; cache utilization; chip multiprocessor; cloud environment; critical shared resource utilization; dual-core machine; fine-grained analysis; grid environment; hierarchical data format; memory latency; multicore node; multicore processor; multithreaded programming; performance feedback; performance limitation; processing work assignment; programming paradigm; quad-core machine; scientific application; scientific dataset processing; scientific middleware library; shared L2 cache; thread scheduling; Hardware; Instruction sets; Libraries; Middleware; Multicore processing; Optimization; Processor scheduling; HDF5; L2 cache; multi-core;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing with Applications (ISPA), 2012 IEEE 10th International Symposium on

Conference_Location

Leganes

Print_ISBN

978-1-4673-1631-6

Type

conf

DOI

10.1109/ISPA.2012.27

Filename

6280286