• DocumentCode
    3145574
  • Title

    GPU Accelerating for Rapid Multi-core Cache Simulation

  • Author

    Han, Wan ; Xiang, Long ; Xiaopeng, Gao ; Yi, Li

  • Author_Institution
    State Key Lab. of Virtual Reality Technol. & Syst., Beihang Univ., Beijing, China
  • fYear
    2011
  • fDate
    16-20 May 2011
  • Firstpage
    1387
  • Lastpage
    1396
  • Abstract
    To find the best memory system for emerging workloads, traces are obtained during application´s execution, then caches with different configurations are simulated using these traces. Since program traces can be several gigabytes, simulation of cache performance is a time consuming process. Compute unified device architecture (CUDA) is a software development platform which enables programmers to accelerate the general-purpose applications on the graphics processing unit (GPU). This paper presents a real time multi-core cache simulator, which was built based on the Pin tool to get the memory reference, and fast method for multi-core cache simulation using the CUDA-enabled GPU. The proposed method is accelerated by the following techniques: execution parallelism exploration, memory latency hiding, a novel trace compression methodology. We describe how these techniques can be incorporated into CUDA code. Experimental results show that the hybrid parallel method of time-partitioning combines with set-partitioning presented here is 11.10× speedup compared to the CPU serial simulation algorithm. The present simulator can characterize cache performance of single-threaded or multi-threaded workloads at the speeds of 6-15 MIPS. It can simulates 6 cache configurations within one single pass at this speeds compared to CMP$im, which can only simulate one cache configuration each simulation pass at the speeds of 4-10 MIPS.
  • Keywords
    cache storage; computer architecture; computer graphic equipment; coprocessors; multiprocessing systems; parallel processing; CUDA; GPU; Pin tool; compute unified device architecture; execution parallelism exploration; graphics processing unit; hybrid parallel method; memory latency hiding; memory system; multicore cache simulation; program traces; set-partitioning; time-partitioning; trace compression methodology; Computational modeling; Graphics processing unit; Instruction sets; Instruments; Parallel processing; Partitioning algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
  • Conference_Location
    Shanghai
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-61284-425-1
  • Electronic_ISBN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2011.295
  • Filename
    6008993