• DocumentCode
    3089427
  • Title

    L2 Cache Performance Analysis and Optimizations for Processing HDF5 Data on Multi-core Nodes

  • Author

    Bhowmik, Rajdeep ; Govindaraju, Madhusudhan

  • Author_Institution
    Dept. of Comput. Sci., SUNY Binghamton, Binghamton, NY, USA
  • fYear
    2012
  • fDate
    10-13 July 2012
  • Firstpage
    142
  • Lastpage
    149
  • Abstract
    It is important to design and develop scientific middleware libraries to harness the opportunities presented by emerging multi-core processors that are available on grid and cloud environments. Scientific middleware libraries not adhering or adapting to this programming paradigm can suffer from severe performance limitations while executing on emerging multi-core processors. In this paper, we focus on the utilization of a critical shared resource on chip multiprocessors (CMPs), the L2 cache. The way in which an application schedules and assigns processing work to each thread determines the access pattern of the shared L2 cache, which may result in either enhancing or diminishing the effects of memory latency on a multi-core processor. Therefore, while processing scientific datasets such as HDF5, it is essential to conduct fine-grained analysis of cache utilization, to make informed processing and scheduling decisions in multi-threaded programming. In this paper, using the TAU toolkit for performance feedback from dual- and quad-core machines, we analyze and recommend methods for effective scheduling of threads on multi-core nodes to augment the performance of scientific applications processing HDF5 data. We discuss the benefits that can be achieved by using L2 Cache-Affinity and L2 Balanced-Set based scheduling algorithms for improving L2 cache performance and effectively the overall execution time.
  • Keywords
    cache storage; microprocessor chips; middleware; multi-threading; multiprocessing systems; natural sciences computing; processor scheduling; programming; HDF5 data processing; L2 balanced-set based scheduling algorithm; L2 cache performance analysis; L2 cache-affinity based scheduling algorithm; TAU toolkit; application scheduling; cache utilization; chip multiprocessor; cloud environment; critical shared resource utilization; dual-core machine; fine-grained analysis; grid environment; hierarchical data format; memory latency; multicore node; multicore processor; multithreaded programming; performance feedback; performance limitation; processing work assignment; programming paradigm; quad-core machine; scientific application; scientific dataset processing; scientific middleware library; shared L2 cache; thread scheduling; Hardware; Instruction sets; Libraries; Middleware; Multicore processing; Optimization; Processor scheduling; HDF5; L2 cache; multi-core;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing with Applications (ISPA), 2012 IEEE 10th International Symposium on
  • Conference_Location
    Leganes
  • Print_ISBN
    978-1-4673-1631-6
  • Type

    conf

  • DOI
    10.1109/ISPA.2012.27
  • Filename
    6280286