• DocumentCode
    2171906
  • Title

    Modeling memory concurrency for multi-socket multi-core systems

  • Author

    Mandal, Anirban ; Fowler, Rob ; Porterfield, Allan

  • Author_Institution
    Renaissance Comput. Inst., Chapel Hill, NC, USA
  • fYear
    2010
  • fDate
    28-30 March 2010
  • Firstpage
    66
  • Lastpage
    75
  • Abstract
    Multi-core computers are ubiquitous and multi-socket versions dominate as nodes in compute clusters. Given the high level of parallelism inherent in processor chips, the ability of memory systems to serve a large number of concurrent memory access operations is becoming a critical performance problem. The most common model of memory performance uses just two numbers, peak bandwidth and typical access latency. We introduce concurrency as an explicit parameter of the measurement and modeling processes to characterize more accurately the complexity of memory behavior of multi-socket, multi-core systems. We present a detailed experimental multi-socket, multi-core memory study based on the PCHASE benchmark, which can vary memory loads by controlling the number of concurrent memory references per thread. The make-up and structure of the memory have a major impact on achievable bandwidth. Three discrete bottlenecks were observed at different levels of the hardware architecture: limits on the number of references outstanding per core; limits to the memory requests serviced by a single memory controller; and limits on the global memory concurrency. We use these results to build a memory performance model that ties concurrency, latency and bandwidth together to create a more accurate model of overall performance. We show that current commodity memory sub-systems cannot handle the load offered by high-end processor chips.
  • Keywords
    computer architecture; concurrency control; concurrency theory; multiprocessing systems; concurrent memory access operations; concurrent memory references; hardware architecture; memory loads; multi-socket multi-core systems; processor chips; single memory controller; Bandwidth; Clocks; Concurrent computing; Delay; Hardware; Parallel processing; Pervasive computing; Random access memory; Semiconductor device measurement; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on
  • Conference_Location
    White Plains, NY
  • Print_ISBN
    978-1-4244-6023-6
  • Electronic_ISBN
    978-1-4244-6024-3
  • Type

    conf

  • DOI
    10.1109/ISPASS.2010.5452064
  • Filename
    5452064