• DocumentCode
    169098
  • Title

    Optimizing Memory Locality Using a Locality-Aware Page Table

  • Author

    Cruz, Eduardo H. M. ; Diener, Matthias ; Alves, Marco A. Z. ; Pilla, Laercio L. ; Navaux, Philippe Olivier Alexandre

  • Author_Institution
    Inf. Inst., Fed. Univ. of Rio Grande do Sul, Porto Alegre, Brazil
  • fYear
    2014
  • fDate
    22-24 Oct. 2014
  • Firstpage
    198
  • Lastpage
    205
  • Abstract
    One of the main challenges for modern parallel shared-memory architectures are accesses to main memory. In current systems, the performance and energy efficiency of memory accesses depend on their locality: accesses to remote caches and NUMA nodes are more expensive than accesses to local ones. Increasing the locality requires knowledge about how the threads of a parallel application access memory pages. With this information, pages can be migrated to the NUMA nodes that access them (data mapping), as well as threads that access the same pages can be migrated to the same node such that locality can be improved even further (thread mapping). In this paper, we propose LAPT, a mechanism to store the memory access pattern of parallel applications in the page table, which is updated by the hardware during TLB misses. This information is used by the operating system to perform an optimized thread and data mapping during the execution of the parallel application. In contrast to previous work, LAPT does not require any previous information about the behavior of the applications, or changes to the application or runtime libraries. Extensive experiments with the NAS Parallel Benchmarks (NPB) and PARSEC showed performance and energy efficiency improvements of up to 19.2% and 15.7%, respectively, (6.7% and 5.3% on average).
  • Keywords
    cache storage; optimisation; parallel architectures; shared memory systems; LAPT; NUMA nodes; data mapping; locality-aware page table; memory access pattern; memory locality optimization; multicore processors; nonuniform memory access; parallel shared-memory architectures; remote caches; Benchmark testing; Hardware; Instruction sets; Memory management; Operating systems; Radiation detectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium on
  • Conference_Location
    Jussieu
  • ISSN
    1550-6533
  • Type

    conf

  • DOI
    10.1109/SBAC-PAD.2014.22
  • Filename
    6970665