• DocumentCode
    1647174
  • Title

    Impact of chip-level integration on performance of OLTP workloads

  • Author

    Barroso, Luiz André ; Gharachorloo, Kourosh ; Nowatzyk, Andreas ; Verghese, Ben

  • Author_Institution
    Western Res. Lab., Compaq Comput. Corp., Houston, TX, USA
  • fYear
    2000
  • fDate
    6/22/1905 12:00:00 AM
  • Firstpage
    3
  • Lastpage
    14
  • Abstract
    With increasing chip densities, future microprocessor designs have the opportunity to integrate many of the traditional system-level modules onto the same chip as the processor. Some current designs already integrate extremely large on-chip caches, and there are aggressive next-generation designs that attempt to also integrate the memory controller, coherence hardware, and network router all onto a single chip. The tight coupling of these modules will enable efficient memory systems with substantially better latency and bandwidth characteristics relative to current designs. Among the important application areas for high-performance servers, online transaction processing (OLTP) workloads are likely to benefit most from these trends due to their large instruction and data footprints and high communication miss rates. This paper examines the design trade-offs that arise as more system functionality is integrated onto the processor chip, and identifies a number of important architectural choices that are influenced by chip-level integration. In addition, the paper presents a detailed study of the performance impact of chip-level integration in the context of OLTP workloads. Our results are based on full system simulations of the Oracle commercial database engine running on both in-order and out-of-order issue processors used in uniprocessor and multiprocessor configurations. The results show that chip-level integration can improve the performance of both configurations by about 1.4 to 1.5 times, though for different reasons. For uniprocessors, integration of the L2 cache and the resulting lower hit latency is the primary factor in performance improvement. For multiprocessors, the improvement comes from both the integration of the L2 cache (lower L2 hit latency) and the integration of the other memory system components (better dirty remote latency). Furthermore, we find that the higher associativity afforded by integrating the L2 cache plays a critical role in counteracting the loss of capacity relative to larger off-chip caches. Finally, we find that the relative gains from chip-level integration are virtually identical for in-order and out-of-order processors
  • Keywords
    data mining; multiprocessing systems; performance evaluation; OLTP workloads; Oracle commercial database engine; architectural choices; chip-level integration; coherence hardware; high-performance servers; memory controller; microprocessor designs; multiprocessor configurations; network router; online transaction processing; system functionality; system simulations; system-level modules; uniprocessor; Bandwidth; Databases; Delay; Hardware; Microprocessors; Network servers; Network-on-a-chip; Next generation networking; Out of order; Process design;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High-Performance Computer Architecture, 2000. HPCA-6. Proceedings. Sixth International Symposium on
  • Conference_Location
    Touluse
  • Print_ISBN
    0-7695-0550-3
  • Type

    conf

  • DOI
    10.1109/HPCA.2000.824334
  • Filename
    824334