• DocumentCode
    2635809
  • Title

    Modeling and Single-Pass Simulation of CMP Cache Capacity and Accessibility

  • Author

    Shi, Xudong ; Su, Feiqi ; Peir, Jih-Kwon ; Xia, Ye ; Yang, Zhen

  • Author_Institution
    Dept. of Comput. & Inf. Sci. & Eng., Florida Univ., Gainesville, FL
  • fYear
    2007
  • fDate
    25-27 April 2007
  • Firstpage
    126
  • Lastpage
    135
  • Abstract
    The future chip-multiprocessors (CMPs) with a large number of cores faces difficult issues in efficient utilizing on-chip storage space. Tradeoffs between data accessibility and effective on-chip capacity have been studied extensively. It requires costly simulations to understand a wide-spectrum of design spaces. In this paper, we first develop an abstract model for understanding the performance impact with respect to the degree of data replication. To overcome the lack of real-time interactions among multiple cores in the abstract model, we propose an efficient single-pass stack simulation method to study the performance of a variety of cache organizations on CMPs. The proposed global stack logically incorporates a shared stack and per-core private stacks to collect shared/private reuse (stack) distances for every memory reference in a single simulation pass. With the collected reuse distances, performance in terms of hits/misses and average memory access times can be calculated for multiple cache organizations. The basic stack simulation results can further derive other CMP cache organizations with various degrees of data replication. We verify both the modeling and the stack results against individual execution-driven simulations that consider realistic cache parameters and delays using a set of commercial multithreaded workloads. We also compare the simulation time saving with the stack simulation. The results show that stack simulation can accurately model the performance of various studied cache organizations with 2-9% error margins using only about 8% of the simulation time. The results also show that the effectiveness of various techniques for optimizing the CMP on-chip storage is closely related to the working sets of the workloads as well as the total cache sizes
  • Keywords
    cache storage; multi-threading; multiprocessing systems; abstract model; average memory access time; chip-multiprocessor; data accessibility; data replication; global stack; multiple cache organization; on-chip cache capacity; on-chip storage space; per-core private stack; reuse distances; shared stack; single simulation pass; single-pass simulation; single-pass stack simulation; Analytical models; Cache storage; Computational modeling; Computer simulation; Delay; Information science; Performance loss; Real time systems; Wiring;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Performance Analysis of Systems & Software, 2007. ISPASS 2007. IEEE International Symposium on
  • Conference_Location
    San Jose, CA
  • Print_ISBN
    1-4244-1081-9
  • Electronic_ISBN
    1-4244-1082-7
  • Type

    conf

  • DOI
    10.1109/ISPASS.2007.363743
  • Filename
    4211029